PHP Regex for .vtt files
I am looking to loop through existing .vtt files and read the cue data into a database.
The format of the .vtt files are:
WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line
Originally I was trying to use ^ and $ to be quite regimented with the lines along the lines of: /^(w*)$^(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})$^(.+)$/ims but I struggled to get this working in the regex checker and resorted to using s to deal with line start/ends.
Currently I am using the following regex: /(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/im
This partially works using online regex checkers like: https://regex101.com/r/mmpObk/3 (this example does not pick up multi-line subtitles, but does get the first line which at this point is good enough for my purpose as all subtitles are currently 1 liners). However if I put this into php (preg_match_all("/(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/mi", $fileData, $matches)) and dump the results I get an array of empty arrays.
What might be different between the online regex and php?
Thanks in advance for any suggestions.
EDIT---
Below is a dump of $fileData and a dump of $matches:
string(341) "WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line"
array(11) {
[0]=>
array(0) {}
[1]=>
array(0) {}
[2]=>
array(0) {}
[3]=>
array(0) {}
[4]=>
array(0) {}
[5]=>
array(0) {}
[6]=>
array(0) {}
[7]=>
array(0) {}
[8]=>
array(0) {}
[9]=>
array(0) {}
[10]=>
array(0) {}
}
php regex webvtt
add a comment |
I am looking to loop through existing .vtt files and read the cue data into a database.
The format of the .vtt files are:
WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line
Originally I was trying to use ^ and $ to be quite regimented with the lines along the lines of: /^(w*)$^(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})$^(.+)$/ims but I struggled to get this working in the regex checker and resorted to using s to deal with line start/ends.
Currently I am using the following regex: /(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/im
This partially works using online regex checkers like: https://regex101.com/r/mmpObk/3 (this example does not pick up multi-line subtitles, but does get the first line which at this point is good enough for my purpose as all subtitles are currently 1 liners). However if I put this into php (preg_match_all("/(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/mi", $fileData, $matches)) and dump the results I get an array of empty arrays.
What might be different between the online regex and php?
Thanks in advance for any suggestions.
EDIT---
Below is a dump of $fileData and a dump of $matches:
string(341) "WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line"
array(11) {
[0]=>
array(0) {}
[1]=>
array(0) {}
[2]=>
array(0) {}
[3]=>
array(0) {}
[4]=>
array(0) {}
[5]=>
array(0) {}
[6]=>
array(0) {}
[7]=>
array(0) {}
[8]=>
array(0) {}
[9]=>
array(0) {}
[10]=>
array(0) {}
}
php regex webvtt
Is all that data in the variable$fileData? So you don't get these matches: 3v4l.org/3CqC7
– The fourth bird
Nov 13 at 17:07
I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
– SGD
Nov 13 at 17:26
add a comment |
I am looking to loop through existing .vtt files and read the cue data into a database.
The format of the .vtt files are:
WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line
Originally I was trying to use ^ and $ to be quite regimented with the lines along the lines of: /^(w*)$^(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})$^(.+)$/ims but I struggled to get this working in the regex checker and resorted to using s to deal with line start/ends.
Currently I am using the following regex: /(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/im
This partially works using online regex checkers like: https://regex101.com/r/mmpObk/3 (this example does not pick up multi-line subtitles, but does get the first line which at this point is good enough for my purpose as all subtitles are currently 1 liners). However if I put this into php (preg_match_all("/(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/mi", $fileData, $matches)) and dump the results I get an array of empty arrays.
What might be different between the online regex and php?
Thanks in advance for any suggestions.
EDIT---
Below is a dump of $fileData and a dump of $matches:
string(341) "WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line"
array(11) {
[0]=>
array(0) {}
[1]=>
array(0) {}
[2]=>
array(0) {}
[3]=>
array(0) {}
[4]=>
array(0) {}
[5]=>
array(0) {}
[6]=>
array(0) {}
[7]=>
array(0) {}
[8]=>
array(0) {}
[9]=>
array(0) {}
[10]=>
array(0) {}
}
php regex webvtt
I am looking to loop through existing .vtt files and read the cue data into a database.
The format of the .vtt files are:
WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line
Originally I was trying to use ^ and $ to be quite regimented with the lines along the lines of: /^(w*)$^(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})$^(.+)$/ims but I struggled to get this working in the regex checker and resorted to using s to deal with line start/ends.
Currently I am using the following regex: /(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/im
This partially works using online regex checkers like: https://regex101.com/r/mmpObk/3 (this example does not pick up multi-line subtitles, but does get the first line which at this point is good enough for my purpose as all subtitles are currently 1 liners). However if I put this into php (preg_match_all("/(.*)s(d{2}):(d{2}):(d{2}).(d{2,3}) --> (d{2}):(d{2}):(d{2}).(d{2,3})s(.+)/mi", $fileData, $matches)) and dump the results I get an array of empty arrays.
What might be different between the online regex and php?
Thanks in advance for any suggestions.
EDIT---
Below is a dump of $fileData and a dump of $matches:
string(341) "WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line"
array(11) {
[0]=>
array(0) {}
[1]=>
array(0) {}
[2]=>
array(0) {}
[3]=>
array(0) {}
[4]=>
array(0) {}
[5]=>
array(0) {}
[6]=>
array(0) {}
[7]=>
array(0) {}
[8]=>
array(0) {}
[9]=>
array(0) {}
[10]=>
array(0) {}
}
php regex webvtt
php regex webvtt
edited Nov 13 at 17:25
asked Nov 13 at 17:00
SGD
165
165
Is all that data in the variable$fileData? So you don't get these matches: 3v4l.org/3CqC7
– The fourth bird
Nov 13 at 17:07
I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
– SGD
Nov 13 at 17:26
add a comment |
Is all that data in the variable$fileData? So you don't get these matches: 3v4l.org/3CqC7
– The fourth bird
Nov 13 at 17:07
I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
– SGD
Nov 13 at 17:26
Is all that data in the variable
$fileData? So you don't get these matches: 3v4l.org/3CqC7– The fourth bird
Nov 13 at 17:07
Is all that data in the variable
$fileData? So you don't get these matches: 3v4l.org/3CqC7– The fourth bird
Nov 13 at 17:07
I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
– SGD
Nov 13 at 17:26
I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
– SGD
Nov 13 at 17:26
add a comment |
1 Answer
1
active
oldest
votes
The problem with your regular expression is poor line-ending handling.
You have this at the end: s(.+)/mi.
This only matches 1 whitespace, but newlines can be 1 or 2 whitespaces.
To fix it, you can use R(.+)/mi.
It works on the website because it is normalizing your newlines into Linux-style newlines.
That is, Windows-style newlines are rn (2 characters) and Linux-style are n (1 character).
Alternativelly, you can try this regular expression:
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
It looks horrible, but it works.
Note: I'm swapping between R and rn because R matches the literal R inside .
The data is captured like this:
- Line number (if present)
- Initial timestamp
- Final timestamp
- Multiline text
You can try it on https://regex101.com/r/Yk8iD1/1
You can use the handy code generator tool to generate the following PHP:
$re = '/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i';
$str = 'WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
You can test it on http://sandbox.onlinephpfunctions.com/code/7f5362f56e912f3504ed075e7013071059cdee7b
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/iworked perfectly - thanks also for the explanation
– SGD
Nov 13 at 17:53
You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
– Ismael Miguel
Nov 13 at 18:00
I restored my intended captures as follows:/(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
– SGD
Nov 13 at 18:04
Do you want me to fix the answer? I would have to re-do the examples and stuff.
– Ismael Miguel
Nov 13 at 18:13
I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
– SGD
Nov 14 at 15:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286052%2fphp-regex-for-vtt-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The problem with your regular expression is poor line-ending handling.
You have this at the end: s(.+)/mi.
This only matches 1 whitespace, but newlines can be 1 or 2 whitespaces.
To fix it, you can use R(.+)/mi.
It works on the website because it is normalizing your newlines into Linux-style newlines.
That is, Windows-style newlines are rn (2 characters) and Linux-style are n (1 character).
Alternativelly, you can try this regular expression:
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
It looks horrible, but it works.
Note: I'm swapping between R and rn because R matches the literal R inside .
The data is captured like this:
- Line number (if present)
- Initial timestamp
- Final timestamp
- Multiline text
You can try it on https://regex101.com/r/Yk8iD1/1
You can use the handy code generator tool to generate the following PHP:
$re = '/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i';
$str = 'WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
You can test it on http://sandbox.onlinephpfunctions.com/code/7f5362f56e912f3504ed075e7013071059cdee7b
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/iworked perfectly - thanks also for the explanation
– SGD
Nov 13 at 17:53
You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
– Ismael Miguel
Nov 13 at 18:00
I restored my intended captures as follows:/(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
– SGD
Nov 13 at 18:04
Do you want me to fix the answer? I would have to re-do the examples and stuff.
– Ismael Miguel
Nov 13 at 18:13
I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
– SGD
Nov 14 at 15:41
add a comment |
The problem with your regular expression is poor line-ending handling.
You have this at the end: s(.+)/mi.
This only matches 1 whitespace, but newlines can be 1 or 2 whitespaces.
To fix it, you can use R(.+)/mi.
It works on the website because it is normalizing your newlines into Linux-style newlines.
That is, Windows-style newlines are rn (2 characters) and Linux-style are n (1 character).
Alternativelly, you can try this regular expression:
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
It looks horrible, but it works.
Note: I'm swapping between R and rn because R matches the literal R inside .
The data is captured like this:
- Line number (if present)
- Initial timestamp
- Final timestamp
- Multiline text
You can try it on https://regex101.com/r/Yk8iD1/1
You can use the handy code generator tool to generate the following PHP:
$re = '/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i';
$str = 'WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
You can test it on http://sandbox.onlinephpfunctions.com/code/7f5362f56e912f3504ed075e7013071059cdee7b
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/iworked perfectly - thanks also for the explanation
– SGD
Nov 13 at 17:53
You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
– Ismael Miguel
Nov 13 at 18:00
I restored my intended captures as follows:/(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
– SGD
Nov 13 at 18:04
Do you want me to fix the answer? I would have to re-do the examples and stuff.
– Ismael Miguel
Nov 13 at 18:13
I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
– SGD
Nov 14 at 15:41
add a comment |
The problem with your regular expression is poor line-ending handling.
You have this at the end: s(.+)/mi.
This only matches 1 whitespace, but newlines can be 1 or 2 whitespaces.
To fix it, you can use R(.+)/mi.
It works on the website because it is normalizing your newlines into Linux-style newlines.
That is, Windows-style newlines are rn (2 characters) and Linux-style are n (1 character).
Alternativelly, you can try this regular expression:
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
It looks horrible, but it works.
Note: I'm swapping between R and rn because R matches the literal R inside .
The data is captured like this:
- Line number (if present)
- Initial timestamp
- Final timestamp
- Multiline text
You can try it on https://regex101.com/r/Yk8iD1/1
You can use the handy code generator tool to generate the following PHP:
$re = '/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i';
$str = 'WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
You can test it on http://sandbox.onlinephpfunctions.com/code/7f5362f56e912f3504ed075e7013071059cdee7b
The problem with your regular expression is poor line-ending handling.
You have this at the end: s(.+)/mi.
This only matches 1 whitespace, but newlines can be 1 or 2 whitespaces.
To fix it, you can use R(.+)/mi.
It works on the website because it is normalizing your newlines into Linux-style newlines.
That is, Windows-style newlines are rn (2 characters) and Linux-style are n (1 character).
Alternativelly, you can try this regular expression:
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
It looks horrible, but it works.
Note: I'm swapping between R and rn because R matches the literal R inside .
The data is captured like this:
- Line number (if present)
- Initial timestamp
- Final timestamp
- Multiline text
You can try it on https://regex101.com/r/Yk8iD1/1
You can use the handy code generator tool to generate the following PHP:
$re = '/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i';
$str = 'WEBVTT FILE
line1
00:00:00.000 --> 00:00:10.000
‘Stuff’
line2
00:00:10.000 --> 00:00:20.000
Other stuff
Example with 2 lines
line3
00:00:20.00 --> 00:00:30.000
Example with only 2 digits in milliseconds
line4
00:00:30.000 --> 00:00:40.000
Different stuff
00:00:40.000 --> 00:00:50.000
Example without a head line';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
You can test it on http://sandbox.onlinephpfunctions.com/code/7f5362f56e912f3504ed075e7013071059cdee7b
edited Nov 13 at 18:29
answered Nov 13 at 17:45
Ismael Miguel
3,13511829
3,13511829
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/iworked perfectly - thanks also for the explanation
– SGD
Nov 13 at 17:53
You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
– Ismael Miguel
Nov 13 at 18:00
I restored my intended captures as follows:/(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
– SGD
Nov 13 at 18:04
Do you want me to fix the answer? I would have to re-do the examples and stuff.
– Ismael Miguel
Nov 13 at 18:13
I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
– SGD
Nov 14 at 15:41
add a comment |
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/iworked perfectly - thanks also for the explanation
– SGD
Nov 13 at 17:53
You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
– Ismael Miguel
Nov 13 at 18:00
I restored my intended captures as follows:/(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i
– SGD
Nov 13 at 18:04
Do you want me to fix the answer? I would have to re-do the examples and stuff.
– Ismael Miguel
Nov 13 at 18:13
I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
– SGD
Nov 14 at 15:41
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i worked perfectly - thanks also for the explanation– SGD
Nov 13 at 17:53
/(?:line(d+)R)?(d{2}(?::d{2}){2}.d{2,3})s*-->s*(d{2}(?::d{2}){2}.d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i worked perfectly - thanks also for the explanation– SGD
Nov 13 at 17:53
You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
– Ismael Miguel
Nov 13 at 18:00
You're welcome, and thank you for the feedback. While I'm happy that you've accepted my answer, I always advise to wait 1-2 days before accepting, in case someone makes another answer.
– Ismael Miguel
Nov 13 at 18:00
I restored my intended captures as follows:
/(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i– SGD
Nov 13 at 18:04
I restored my intended captures as follows:
/(?:(w+)R)?(d{2}):(d{2}):(d{2}).(d{2,3})s*-->s*(d{2}):(d{2}):(d{2}).(d{2,3})R((?:[^rn]|r?n[^rn])*)(?:r?nr?n|$)/i– SGD
Nov 13 at 18:04
Do you want me to fix the answer? I would have to re-do the examples and stuff.
– Ismael Miguel
Nov 13 at 18:13
Do you want me to fix the answer? I would have to re-do the examples and stuff.
– Ismael Miguel
Nov 13 at 18:13
I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
– SGD
Nov 14 at 15:41
I don't know what the SO protocol is - I had presumed that my comment above with the regex I am using would be enough for people to find and reference.
– SGD
Nov 14 at 15:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286052%2fphp-regex-for-vtt-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is all that data in the variable
$fileData? So you don't get these matches: 3v4l.org/3CqC7– The fourth bird
Nov 13 at 17:07
I dumped out the content of $fileData and everything looks correct to me - I have added the dumps to the original question for reference.
– SGD
Nov 13 at 17:26