PHP Regex for .vtt files












0














I am looking to loop through existing .vtt files and read the cue data into a database.



The format of the .vtt files are:



WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line


Originally I was trying to use ^ and $ to be quite regimented with the lines along the lines of: /^(w*)$^(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})$^(.+)$/ims but I struggled to get this working in the regex checker and resorted to using s to deal with line start/ends.



Currently I am using the following regex: /(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/im



This partially works using online regex checkers like: https://regex101.com/r/mmpObk/3 (this example does not pick up multi-line subtitles, but does get the first line which at this point is good enough for my purpose as all subtitles are currently 1 liners). However if I put this into php (preg_match_all("/(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/mi", $fileData, $matches)) and dump the results I get an array of empty arrays.



What might be different between the online regex and php?



Thanks in advance for any suggestions.



EDIT---
Below is a dump of $fileData and a dump of $matches:



string(341) "WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line"

array(11) {
[0]=>
array(0) {}
[1]=>
array(0) {}
[2]=>
array(0) {}
[3]=>
array(0) {}
[4]=>
array(0) {}
[5]=>
array(0) {}
[6]=>
array(0) {}
[7]=>
array(0) {}
[8]=>
array(0) {}
[9]=>
array(0) {}
[10]=>
array(0) {}
}









share|improve this question
























  • Is all that data in the variable $fileData? So you don't get these matches: 3v4l.org/3CqC7
    – The fourth bird
    Nov 13 at 17:07










  • I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
    – SGD
    Nov 13 at 17:26
















0














I am looking to loop through existing .vtt files and read the cue data into a database.



The format of the .vtt files are:



WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line


Originally I was trying to use ^ and $ to be quite regimented with the lines along the lines of: /^(w*)$^(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})$^(.+)$/ims but I struggled to get this working in the regex checker and resorted to using s to deal with line start/ends.



Currently I am using the following regex: /(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/im



This partially works using online regex checkers like: https://regex101.com/r/mmpObk/3 (this example does not pick up multi-line subtitles, but does get the first line which at this point is good enough for my purpose as all subtitles are currently 1 liners). However if I put this into php (preg_match_all("/(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/mi", $fileData, $matches)) and dump the results I get an array of empty arrays.



What might be different between the online regex and php?



Thanks in advance for any suggestions.



EDIT---
Below is a dump of $fileData and a dump of $matches:



string(341) "WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line"

array(11) {
[0]=>
array(0) {}
[1]=>
array(0) {}
[2]=>
array(0) {}
[3]=>
array(0) {}
[4]=>
array(0) {}
[5]=>
array(0) {}
[6]=>
array(0) {}
[7]=>
array(0) {}
[8]=>
array(0) {}
[9]=>
array(0) {}
[10]=>
array(0) {}
}









share|improve this question
























  • Is all that data in the variable $fileData? So you don't get these matches: 3v4l.org/3CqC7
    – The fourth bird
    Nov 13 at 17:07










  • I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
    – SGD
    Nov 13 at 17:26














0












0








0







I am looking to loop through existing .vtt files and read the cue data into a database.



The format of the .vtt files are:



WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line


Originally I was trying to use ^ and $ to be quite regimented with the lines along the lines of: /^(w*)$^(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})$^(.+)$/ims but I struggled to get this working in the regex checker and resorted to using s to deal with line start/ends.



Currently I am using the following regex: /(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/im



This partially works using online regex checkers like: https://regex101.com/r/mmpObk/3 (this example does not pick up multi-line subtitles, but does get the first line which at this point is good enough for my purpose as all subtitles are currently 1 liners). However if I put this into php (preg_match_all("/(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/mi", $fileData, $matches)) and dump the results I get an array of empty arrays.



What might be different between the online regex and php?



Thanks in advance for any suggestions.



EDIT---
Below is a dump of $fileData and a dump of $matches:



string(341) "WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line"

array(11) {
[0]=>
array(0) {}
[1]=>
array(0) {}
[2]=>
array(0) {}
[3]=>
array(0) {}
[4]=>
array(0) {}
[5]=>
array(0) {}
[6]=>
array(0) {}
[7]=>
array(0) {}
[8]=>
array(0) {}
[9]=>
array(0) {}
[10]=>
array(0) {}
}









share|improve this question















I am looking to loop through existing .vtt files and read the cue data into a database.



The format of the .vtt files are:



WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line


Originally I was trying to use ^ and $ to be quite regimented with the lines along the lines of: /^(w*)$^(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})$^(.+)$/ims but I struggled to get this working in the regex checker and resorted to using s to deal with line start/ends.



Currently I am using the following regex: /(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/im



This partially works using online regex checkers like: https://regex101.com/r/mmpObk/3 (this example does not pick up multi-line subtitles, but does get the first line which at this point is good enough for my purpose as all subtitles are currently 1 liners). However if I put this into php (preg_match_all("/(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/mi", $fileData, $matches)) and dump the results I get an array of empty arrays.



What might be different between the online regex and php?



Thanks in advance for any suggestions.



EDIT---
Below is a dump of $fileData and a dump of $matches:



string(341) "WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line"

array(11) {
[0]=>
array(0) {}
[1]=>
array(0) {}
[2]=>
array(0) {}
[3]=>
array(0) {}
[4]=>
array(0) {}
[5]=>
array(0) {}
[6]=>
array(0) {}
[7]=>
array(0) {}
[8]=>
array(0) {}
[9]=>
array(0) {}
[10]=>
array(0) {}
}






php regex webvtt






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 at 17:25

























asked Nov 13 at 17:00









SGD

165




165












  • Is all that data in the variable $fileData? So you don't get these matches: 3v4l.org/3CqC7
    – The fourth bird
    Nov 13 at 17:07










  • I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
    – SGD
    Nov 13 at 17:26


















  • Is all that data in the variable $fileData? So you don't get these matches: 3v4l.org/3CqC7
    – The fourth bird
    Nov 13 at 17:07










  • I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
    – SGD
    Nov 13 at 17:26
















Is all that data in the variable $fileData? So you don't get these matches: 3v4l.org/3CqC7
– The fourth bird
Nov 13 at 17:07




Is all that data in the variable $fileData? So you don't get these matches: 3v4l.org/3CqC7
– The fourth bird
Nov 13 at 17:07












I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
– SGD
Nov 13 at 17:26




I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
– SGD
Nov 13 at 17:26












1 Answer
1






active

oldest

votes


















1














The problem with your regular expression is poor line-ending handling.



You have this at the end: s(.+)/mi.

This only matches 1 whitespace, but newlines can be 1 or 2 whitespaces.



To fix it, you can use R(.+)/mi.



It works on the website because it is normalizing your newlines into Linux-style newlines.

That is, Windows-style newlines are rn (2 characters) and Linux-style are n (1 character).





Alternativelly, you can try this regular expression:



/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i


It looks horrible, but it works.

Note: I'm swapping between R and rn because R matches the literal R inside .



The data is captured like this:




  1. Line number (if present)

  2. Initial timestamp

  3. Final timestamp

  4. Multiline text


You can try it on https://regex101.com/r/Yk8iD1/1



You can use the handy code generator tool to generate the following PHP:



$re = '/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i';
$str = 'WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);


You can test it on http://sandbox.onlinephpfunctions.com/code/7f5362f56e912f3504ed075e7013071059cdee7b






share|improve this answer























  • /(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i worked perfectly - thanks also for the explanation
    – SGD
    Nov 13 at 17:53










  • You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
    – Ismael Miguel
    Nov 13 at 18:00










  • I restored my intended captures as follows: /(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
    – SGD
    Nov 13 at 18:04










  • Do you want me to fix the answer? I would have to re-do the examples and stuff.
    – Ismael Miguel
    Nov 13 at 18:13










  • I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
    – SGD
    Nov 14 at 15:41











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286052%2fphp-regex-for-vtt-files%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














The problem with your regular expression is poor line-ending handling.



You have this at the end: s(.+)/mi.

This only matches 1 whitespace, but newlines can be 1 or 2 whitespaces.



To fix it, you can use R(.+)/mi.



It works on the website because it is normalizing your newlines into Linux-style newlines.

That is, Windows-style newlines are rn (2 characters) and Linux-style are n (1 character).





Alternativelly, you can try this regular expression:



/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i


It looks horrible, but it works.

Note: I'm swapping between R and rn because R matches the literal R inside .



The data is captured like this:




  1. Line number (if present)

  2. Initial timestamp

  3. Final timestamp

  4. Multiline text


You can try it on https://regex101.com/r/Yk8iD1/1



You can use the handy code generator tool to generate the following PHP:



$re = '/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i';
$str = 'WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);


You can test it on http://sandbox.onlinephpfunctions.com/code/7f5362f56e912f3504ed075e7013071059cdee7b






share|improve this answer























  • /(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i worked perfectly - thanks also for the explanation
    – SGD
    Nov 13 at 17:53










  • You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
    – Ismael Miguel
    Nov 13 at 18:00










  • I restored my intended captures as follows: /(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
    – SGD
    Nov 13 at 18:04










  • Do you want me to fix the answer? I would have to re-do the examples and stuff.
    – Ismael Miguel
    Nov 13 at 18:13










  • I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
    – SGD
    Nov 14 at 15:41
















1














The problem with your regular expression is poor line-ending handling.



You have this at the end: s(.+)/mi.

This only matches 1 whitespace, but newlines can be 1 or 2 whitespaces.



To fix it, you can use R(.+)/mi.



It works on the website because it is normalizing your newlines into Linux-style newlines.

That is, Windows-style newlines are rn (2 characters) and Linux-style are n (1 character).





Alternativelly, you can try this regular expression:



/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i


It looks horrible, but it works.

Note: I'm swapping between R and rn because R matches the literal R inside .



The data is captured like this:




  1. Line number (if present)

  2. Initial timestamp

  3. Final timestamp

  4. Multiline text


You can try it on https://regex101.com/r/Yk8iD1/1



You can use the handy code generator tool to generate the following PHP:



$re = '/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i';
$str = 'WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);


You can test it on http://sandbox.onlinephpfunctions.com/code/7f5362f56e912f3504ed075e7013071059cdee7b






share|improve this answer























  • /(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i worked perfectly - thanks also for the explanation
    – SGD
    Nov 13 at 17:53










  • You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
    – Ismael Miguel
    Nov 13 at 18:00










  • I restored my intended captures as follows: /(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
    – SGD
    Nov 13 at 18:04










  • Do you want me to fix the answer? I would have to re-do the examples and stuff.
    – Ismael Miguel
    Nov 13 at 18:13










  • I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
    – SGD
    Nov 14 at 15:41














1












1








1






The problem with your regular expression is poor line-ending handling.



You have this at the end: s(.+)/mi.

This only matches 1 whitespace, but newlines can be 1 or 2 whitespaces.



To fix it, you can use R(.+)/mi.



It works on the website because it is normalizing your newlines into Linux-style newlines.

That is, Windows-style newlines are rn (2 characters) and Linux-style are n (1 character).





Alternativelly, you can try this regular expression:



/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i


It looks horrible, but it works.

Note: I'm swapping between R and rn because R matches the literal R inside .



The data is captured like this:




  1. Line number (if present)

  2. Initial timestamp

  3. Final timestamp

  4. Multiline text


You can try it on https://regex101.com/r/Yk8iD1/1



You can use the handy code generator tool to generate the following PHP:



$re = '/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i';
$str = 'WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);


You can test it on http://sandbox.onlinephpfunctions.com/code/7f5362f56e912f3504ed075e7013071059cdee7b






share|improve this answer














The problem with your regular expression is poor line-ending handling.



You have this at the end: s(.+)/mi.

This only matches 1 whitespace, but newlines can be 1 or 2 whitespaces.



To fix it, you can use R(.+)/mi.



It works on the website because it is normalizing your newlines into Linux-style newlines.

That is, Windows-style newlines are rn (2 characters) and Linux-style are n (1 character).





Alternativelly, you can try this regular expression:



/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i


It looks horrible, but it works.

Note: I'm swapping between R and rn because R matches the literal R inside .



The data is captured like this:




  1. Line number (if present)

  2. Initial timestamp

  3. Final timestamp

  4. Multiline text


You can try it on https://regex101.com/r/Yk8iD1/1



You can use the handy code generator tool to generate the following PHP:



$re = '/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i';
$str = 'WEBVTT FILE

line1
00:00:00.000 --> 00:00:10.000
‘Stuff’

line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines

line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds

line4
00:00:30.000 --> 00:00:40.000
Different stuff

00:00:40.000 --> 00:00:50.000
Example without a head line';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);


You can test it on http://sandbox.onlinephpfunctions.com/code/7f5362f56e912f3504ed075e7013071059cdee7b







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 13 at 18:29

























answered Nov 13 at 17:45









Ismael Miguel

3,13511829




3,13511829












  • /(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i worked perfectly - thanks also for the explanation
    – SGD
    Nov 13 at 17:53










  • You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
    – Ismael Miguel
    Nov 13 at 18:00










  • I restored my intended captures as follows: /(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
    – SGD
    Nov 13 at 18:04










  • Do you want me to fix the answer? I would have to re-do the examples and stuff.
    – Ismael Miguel
    Nov 13 at 18:13










  • I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
    – SGD
    Nov 14 at 15:41


















  • /(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i worked perfectly - thanks also for the explanation
    – SGD
    Nov 13 at 17:53










  • You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
    – Ismael Miguel
    Nov 13 at 18:00










  • I restored my intended captures as follows: /(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
    – SGD
    Nov 13 at 18:04










  • Do you want me to fix the answer? I would have to re-do the examples and stuff.
    – Ismael Miguel
    Nov 13 at 18:13










  • I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
    – SGD
    Nov 14 at 15:41
















/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i worked perfectly - thanks also for the explanation
– SGD
Nov 13 at 17:53




/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i worked perfectly - thanks also for the explanation
– SGD
Nov 13 at 17:53












You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
– Ismael Miguel
Nov 13 at 18:00




You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
– Ismael Miguel
Nov 13 at 18:00












I restored my intended captures as follows: /(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
– SGD
Nov 13 at 18:04




I restored my intended captures as follows: /(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
– SGD
Nov 13 at 18:04












Do you want me to fix the answer? I would have to re-do the examples and stuff.
– Ismael Miguel
Nov 13 at 18:13




Do you want me to fix the answer? I would have to re-do the examples and stuff.
– Ismael Miguel
Nov 13 at 18:13












I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
– SGD
Nov 14 at 15:41




I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
– SGD
Nov 14 at 15:41


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286052%2fphp-regex-for-vtt-files%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to pass form data using jquery Ajax to insert data in database?

National Museum of Racing and Hall of Fame

Guess what letter conforming each word