Regex to parse line with and capture string and comma separated number
up vote
1
down vote
favorite
I am trying to parse a file with lines similar to:
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
I need to capture the name and the number in the first column. The end result would be
John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188
I've tried
s*b(.*)b(s*.s*.*)(d+,d+|d+)b
s*b(.*)b(.|.s)+b(d+,d+|d+)b
Any suggestions?
php regex string parsing delimited-text
add a comment |
up vote
1
down vote
favorite
I am trying to parse a file with lines similar to:
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
I need to capture the name and the number in the first column. The end result would be
John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188
I've tried
s*b(.*)b(s*.s*.*)(d+,d+|d+)b
s*b(.*)b(.|.s)+b(d+,d+|d+)b
Any suggestions?
php regex string parsing delimited-text
Is the data always column aligned?
– Salman A
Nov 9 at 21:26
@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27
Then use substr. Not regex.
– Salman A
Nov 9 at 21:28
@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I am trying to parse a file with lines similar to:
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
I need to capture the name and the number in the first column. The end result would be
John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188
I've tried
s*b(.*)b(s*.s*.*)(d+,d+|d+)b
s*b(.*)b(.|.s)+b(d+,d+|d+)b
Any suggestions?
php regex string parsing delimited-text
I am trying to parse a file with lines similar to:
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
I need to capture the name and the number in the first column. The end result would be
John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188
I've tried
s*b(.*)b(s*.s*.*)(d+,d+|d+)b
s*b(.*)b(.|.s)+b(d+,d+|d+)b
Any suggestions?
php regex string parsing delimited-text
php regex string parsing delimited-text
edited Nov 10 at 15:27
Salman A
172k65329415
172k65329415
asked Nov 9 at 21:20
sho
6071616
6071616
Is the data always column aligned?
– Salman A
Nov 9 at 21:26
@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27
Then use substr. Not regex.
– Salman A
Nov 9 at 21:28
@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32
add a comment |
Is the data always column aligned?
– Salman A
Nov 9 at 21:26
@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27
Then use substr. Not regex.
– Salman A
Nov 9 at 21:28
@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32
Is the data always column aligned?
– Salman A
Nov 9 at 21:26
Is the data always column aligned?
– Salman A
Nov 9 at 21:26
@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27
@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27
Then use substr. Not regex.
– Salman A
Nov 9 at 21:28
Then use substr. Not regex.
– Salman A
Nov 9 at 21:28
@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32
@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32
add a comment |
3 Answers
3
active
oldest
votes
up vote
1
down vote
accepted
This pattern captures the name by finding the dot sequence after the name.
Then captures a number and comma pattern as the number.
Then I loop to build the new array and replace comma with nothing.
$str = ' John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29';
preg_match_all("/s*(.*?)s*. ..*?([d,]+)/", $str, $matches);
foreach($matches[1] as $key => $name){
$new = $name . "," . str_replace(",", "", $matches[2][$key]);
}
var_dump($new);
Output:
array(5) {
[0]=>
string(27) "John David James (DEM),7808"
[1]=>
string(26) "Marvin D. Scott (DEM),6548"
[2]=>
string(32) "Maria "Mary" Williams (DEM),4551"
[3]=>
string(22) "Dwayne R. Johnson,4322"
[4]=>
string(12) "WRITE-IN,188"
}
https://3v4l.org/SdqoZ
Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20
add a comment |
up vote
1
down vote
You can achieve it with an UNGREEDY regexp.
Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*
.
But the engine is set in greedy mode default. What will happen? The first part (.+)
won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.
Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.
We need to tell him to "eat" the less matchable part.
<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
$lines = explode("n", $lines);
// Here, the U flag sets the ungreedy mode
$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';
echo "<pre>";
foreach ($lines as $line) {
// Here : - ${1} will capture the name,
// - ${2} the integer part of the number
// - ${3} the decimal part
echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";
}
echo "</pre>";
?>
Result:
John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188
1
Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06
Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19
Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25
Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12
Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23
add a comment |
up vote
1
down vote
If the data is column aligned (all columns have known, fixed width) then use string functions such as substr
:
<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
foreach(preg_split('/(\r|\n)+/', $lines) as $line) {
if ($line === '') continue;
$name = substr($line, 0, 46);
$amount = substr($line, 46, 10);
$name = rtrim(ltrim($name), " .");
$amount = (float) str_replace(",", "", $amount);
echo $name . ", " . $amount;
}
Thanks. This works well.
– sho
Nov 9 at 23:10
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
This pattern captures the name by finding the dot sequence after the name.
Then captures a number and comma pattern as the number.
Then I loop to build the new array and replace comma with nothing.
$str = ' John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29';
preg_match_all("/s*(.*?)s*. ..*?([d,]+)/", $str, $matches);
foreach($matches[1] as $key => $name){
$new = $name . "," . str_replace(",", "", $matches[2][$key]);
}
var_dump($new);
Output:
array(5) {
[0]=>
string(27) "John David James (DEM),7808"
[1]=>
string(26) "Marvin D. Scott (DEM),6548"
[2]=>
string(32) "Maria "Mary" Williams (DEM),4551"
[3]=>
string(22) "Dwayne R. Johnson,4322"
[4]=>
string(12) "WRITE-IN,188"
}
https://3v4l.org/SdqoZ
Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20
add a comment |
up vote
1
down vote
accepted
This pattern captures the name by finding the dot sequence after the name.
Then captures a number and comma pattern as the number.
Then I loop to build the new array and replace comma with nothing.
$str = ' John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29';
preg_match_all("/s*(.*?)s*. ..*?([d,]+)/", $str, $matches);
foreach($matches[1] as $key => $name){
$new = $name . "," . str_replace(",", "", $matches[2][$key]);
}
var_dump($new);
Output:
array(5) {
[0]=>
string(27) "John David James (DEM),7808"
[1]=>
string(26) "Marvin D. Scott (DEM),6548"
[2]=>
string(32) "Maria "Mary" Williams (DEM),4551"
[3]=>
string(22) "Dwayne R. Johnson,4322"
[4]=>
string(12) "WRITE-IN,188"
}
https://3v4l.org/SdqoZ
Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
This pattern captures the name by finding the dot sequence after the name.
Then captures a number and comma pattern as the number.
Then I loop to build the new array and replace comma with nothing.
$str = ' John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29';
preg_match_all("/s*(.*?)s*. ..*?([d,]+)/", $str, $matches);
foreach($matches[1] as $key => $name){
$new = $name . "," . str_replace(",", "", $matches[2][$key]);
}
var_dump($new);
Output:
array(5) {
[0]=>
string(27) "John David James (DEM),7808"
[1]=>
string(26) "Marvin D. Scott (DEM),6548"
[2]=>
string(32) "Maria "Mary" Williams (DEM),4551"
[3]=>
string(22) "Dwayne R. Johnson,4322"
[4]=>
string(12) "WRITE-IN,188"
}
https://3v4l.org/SdqoZ
This pattern captures the name by finding the dot sequence after the name.
Then captures a number and comma pattern as the number.
Then I loop to build the new array and replace comma with nothing.
$str = ' John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29';
preg_match_all("/s*(.*?)s*. ..*?([d,]+)/", $str, $matches);
foreach($matches[1] as $key => $name){
$new = $name . "," . str_replace(",", "", $matches[2][$key]);
}
var_dump($new);
Output:
array(5) {
[0]=>
string(27) "John David James (DEM),7808"
[1]=>
string(26) "Marvin D. Scott (DEM),6548"
[2]=>
string(32) "Maria "Mary" Williams (DEM),4551"
[3]=>
string(22) "Dwayne R. Johnson,4322"
[4]=>
string(12) "WRITE-IN,188"
}
https://3v4l.org/SdqoZ
answered Nov 9 at 22:04
Andreas
14.4k31441
14.4k31441
Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20
add a comment |
Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20
Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20
Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20
add a comment |
up vote
1
down vote
You can achieve it with an UNGREEDY regexp.
Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*
.
But the engine is set in greedy mode default. What will happen? The first part (.+)
won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.
Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.
We need to tell him to "eat" the less matchable part.
<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
$lines = explode("n", $lines);
// Here, the U flag sets the ungreedy mode
$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';
echo "<pre>";
foreach ($lines as $line) {
// Here : - ${1} will capture the name,
// - ${2} the integer part of the number
// - ${3} the decimal part
echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";
}
echo "</pre>";
?>
Result:
John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188
1
Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06
Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19
Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25
Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12
Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23
add a comment |
up vote
1
down vote
You can achieve it with an UNGREEDY regexp.
Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*
.
But the engine is set in greedy mode default. What will happen? The first part (.+)
won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.
Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.
We need to tell him to "eat" the less matchable part.
<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
$lines = explode("n", $lines);
// Here, the U flag sets the ungreedy mode
$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';
echo "<pre>";
foreach ($lines as $line) {
// Here : - ${1} will capture the name,
// - ${2} the integer part of the number
// - ${3} the decimal part
echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";
}
echo "</pre>";
?>
Result:
John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188
1
Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06
Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19
Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25
Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12
Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23
add a comment |
up vote
1
down vote
up vote
1
down vote
You can achieve it with an UNGREEDY regexp.
Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*
.
But the engine is set in greedy mode default. What will happen? The first part (.+)
won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.
Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.
We need to tell him to "eat" the less matchable part.
<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
$lines = explode("n", $lines);
// Here, the U flag sets the ungreedy mode
$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';
echo "<pre>";
foreach ($lines as $line) {
// Here : - ${1} will capture the name,
// - ${2} the integer part of the number
// - ${3} the decimal part
echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";
}
echo "</pre>";
?>
Result:
John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188
You can achieve it with an UNGREEDY regexp.
Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*
.
But the engine is set in greedy mode default. What will happen? The first part (.+)
won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.
Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.
We need to tell him to "eat" the less matchable part.
<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
$lines = explode("n", $lines);
// Here, the U flag sets the ungreedy mode
$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';
echo "<pre>";
foreach ($lines as $line) {
// Here : - ${1} will capture the name,
// - ${2} the integer part of the number
// - ${3} the decimal part
echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";
}
echo "</pre>";
?>
Result:
John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188
edited Nov 9 at 23:15
answered Nov 9 at 21:54
Amessihel
1,8201623
1,8201623
1
Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06
Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19
Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25
Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12
Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23
add a comment |
1
Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06
Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19
Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25
Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12
Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23
1
1
Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06
Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06
Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19
Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19
Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25
Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25
Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12
Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12
Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23
Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23
add a comment |
up vote
1
down vote
If the data is column aligned (all columns have known, fixed width) then use string functions such as substr
:
<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
foreach(preg_split('/(\r|\n)+/', $lines) as $line) {
if ($line === '') continue;
$name = substr($line, 0, 46);
$amount = substr($line, 46, 10);
$name = rtrim(ltrim($name), " .");
$amount = (float) str_replace(",", "", $amount);
echo $name . ", " . $amount;
}
Thanks. This works well.
– sho
Nov 9 at 23:10
add a comment |
up vote
1
down vote
If the data is column aligned (all columns have known, fixed width) then use string functions such as substr
:
<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
foreach(preg_split('/(\r|\n)+/', $lines) as $line) {
if ($line === '') continue;
$name = substr($line, 0, 46);
$amount = substr($line, 46, 10);
$name = rtrim(ltrim($name), " .");
$amount = (float) str_replace(",", "", $amount);
echo $name . ", " . $amount;
}
Thanks. This works well.
– sho
Nov 9 at 23:10
add a comment |
up vote
1
down vote
up vote
1
down vote
If the data is column aligned (all columns have known, fixed width) then use string functions such as substr
:
<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
foreach(preg_split('/(\r|\n)+/', $lines) as $line) {
if ($line === '') continue;
$name = substr($line, 0, 46);
$amount = substr($line, 46, 10);
$name = rtrim(ltrim($name), " .");
$amount = (float) str_replace(",", "", $amount);
echo $name . ", " . $amount;
}
If the data is column aligned (all columns have known, fixed width) then use string functions such as substr
:
<?php
$lines = '
John David James (DEM) . . . . . . 7,808 10.51
Marvin D. Scott (DEM) . . . . . . 6,548 9.55
Maria "Mary" Williams (DEM) . . . . 4,551 8.58
Dwayne R. Johnson. . . . . . . . 4,322 8.22
WRITE-IN. . . . . . . . . . . 188 .29
';
foreach(preg_split('/(\r|\n)+/', $lines) as $line) {
if ($line === '') continue;
$name = substr($line, 0, 46);
$amount = substr($line, 46, 10);
$name = rtrim(ltrim($name), " .");
$amount = (float) str_replace(",", "", $amount);
echo $name . ", " . $amount;
}
edited Nov 9 at 23:45
answered Nov 9 at 21:36
Salman A
172k65329415
172k65329415
Thanks. This works well.
– sho
Nov 9 at 23:10
add a comment |
Thanks. This works well.
– sho
Nov 9 at 23:10
Thanks. This works well.
– sho
Nov 9 at 23:10
Thanks. This works well.
– sho
Nov 9 at 23:10
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53233419%2fregex-to-parse-line-with-and-capture-string-and-comma-separated-number%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is the data always column aligned?
– Salman A
Nov 9 at 21:26
@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27
Then use substr. Not regex.
– Salman A
Nov 9 at 21:28
@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32