Regex to parse line with and capture string and comma separated number

up vote
1
down vote

favorite

I am trying to parse a file with lines similar to:

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

I need to capture the name and the number in the first column. The end result would be

John David James (DEM),7808

Marvin D. Scott (DEM),6548

Maria "Mary" Williams (DEM),4551

Dwayne R. Johnson,4322

WRITE-IN,188

I've tried

s*b(.*)b(s*.s*.*)(d+,d+|d+)b

s*b(.*)b(.|.s)+b(d+,d+|d+)b

Any suggestions?

edited Nov 10 at 15:27

Salman A

172k65329415

asked Nov 9 at 21:20

sho

6071616

Is the data always column aligned?
– Salman A
Nov 9 at 21:26

@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27

Then use substr. Not regex.
– Salman A
Nov 9 at 21:28

@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32

add a comment |

up vote
1
down vote

favorite

I am trying to parse a file with lines similar to:

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

I need to capture the name and the number in the first column. The end result would be

John David James (DEM),7808

Marvin D. Scott (DEM),6548

Maria "Mary" Williams (DEM),4551

Dwayne R. Johnson,4322

WRITE-IN,188

I've tried

s*b(.*)b(s*.s*.*)(d+,d+|d+)b

s*b(.*)b(.|.s)+b(d+,d+|d+)b

Any suggestions?

edited Nov 10 at 15:27

Salman A

172k65329415

asked Nov 9 at 21:20

sho

6071616

Is the data always column aligned?
– Salman A
Nov 9 at 21:26

@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27

Then use substr. Not regex.
– Salman A
Nov 9 at 21:28

@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32

add a comment |

up vote
1
down vote

favorite

I am trying to parse a file with lines similar to:

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

I need to capture the name and the number in the first column. The end result would be

John David James (DEM),7808

Marvin D. Scott (DEM),6548

Maria "Mary" Williams (DEM),4551

Dwayne R. Johnson,4322

WRITE-IN,188

I've tried

s*b(.*)b(s*.s*.*)(d+,d+|d+)b

s*b(.*)b(.|.s)+b(d+,d+|d+)b

Any suggestions?

edited Nov 10 at 15:27

Salman A

172k65329415

asked Nov 9 at 21:20

sho

6071616

I am trying to parse a file with lines similar to:

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

I need to capture the name and the number in the first column. The end result would be

John David James (DEM),7808

Marvin D. Scott (DEM),6548

Maria "Mary" Williams (DEM),4551

Dwayne R. Johnson,4322

WRITE-IN,188

I've tried

s*b(.*)b(s*.s*.*)(d+,d+|d+)b

s*b(.*)b(.|.s)+b(d+,d+|d+)b

Any suggestions?

php regex string parsing delimited-text

edited Nov 10 at 15:27

Salman A

172k65329415

asked Nov 9 at 21:20

sho

6071616

edited Nov 10 at 15:27

Salman A

172k65329415

asked Nov 9 at 21:20

sho

6071616

edited Nov 10 at 15:27

Salman A

172k65329415

edited Nov 10 at 15:27

Salman A

172k65329415

edited Nov 10 at 15:27

Salman A

172k65329415

asked Nov 9 at 21:20

sho

6071616

asked Nov 9 at 21:20

sho

6071616

asked Nov 9 at 21:20

sho

6071616

Is the data always column aligned?
– Salman A
Nov 9 at 21:26

@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27

Then use substr. Not regex.
– Salman A
Nov 9 at 21:28

@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32

add a comment |

Is the data always column aligned?
– Salman A
Nov 9 at 21:26

@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27

Then use substr. Not regex.
– Salman A
Nov 9 at 21:28

@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32

Is the data always column aligned?
– Salman A
Nov 9 at 21:26

@SalmanA yes. They use periods and spaces to separate the names from the numbers
– sho
Nov 9 at 21:27

Then use substr. Not regex.
– Salman A
Nov 9 at 21:28

@SalmanA the length of the name varies and the value could be 1 - 5 digits.
– sho
Nov 9 at 21:32

add a comment |

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

This pattern captures the name by finding the dot sequence after the name.

Then captures a number and comma pattern as the number.

Then I loop to build the new array and replace comma with nothing.

$str = '       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29';

preg_match_all("/s*(.*?)s*.  ..*?([d,]+)/", $str, $matches);



foreach($matches[1] as $key => $name){

    $new = $name . "," . str_replace(",", "", $matches[2][$key]);

}





var_dump($new);

Output:

array(5) {

  [0]=>

  string(27) "John David James (DEM),7808"

  [1]=>

  string(26) "Marvin D. Scott (DEM),6548"

  [2]=>

  string(32) "Maria "Mary" Williams (DEM),4551"

  [3]=>

  string(22) "Dwayne R. Johnson,4322"

  [4]=>

  string(12) "WRITE-IN,188"

}

https://3v4l.org/SdqoZ

answered Nov 9 at 22:04

Andreas

14.4k31441

Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20

add a comment |

up vote
1
down vote

You can achieve it with an UNGREEDY regexp.

Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*.

But the engine is set in greedy mode default. What will happen? The first part (.+) won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.

Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.

We need to tell him to "eat" the less matchable part.

<?php

$lines = '

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

';

$lines = explode("n", $lines);



// Here, the U flag sets the ungreedy mode

$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';

echo "<pre>";

foreach ($lines  as $line) {

    // Here : - ${1} will capture the name,

    //        - ${2} the integer part of the number

    //        - ${3} the decimal part

    echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";

}

echo "</pre>";

?>

Result:

John David James (DEM),7808

Marvin D. Scott (DEM),6548

Maria "Mary" Williams (DEM),4551

Dwayne R. Johnson,4322

WRITE-IN,188

edited Nov 9 at 23:15

answered Nov 9 at 21:54

Amessihel

1,8201623

1

Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06

Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19

Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25

Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12

Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23

add a comment |

up vote
1
down vote

If the data is column aligned (all columns have known, fixed width) then use string functions such as substr:

<?php

$lines = '

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

';



foreach(preg_split('/(\r|\n)+/', $lines) as $line) {

    if ($line === '') continue;

    $name = substr($line, 0, 46);

    $amount = substr($line, 46, 10);

    $name = rtrim(ltrim($name), " .");

    $amount = (float) str_replace(",", "", $amount);

    echo $name . ", " . $amount;

}

edited Nov 9 at 23:45

answered Nov 9 at 21:36

Salman A

172k65329415

Thanks. This works well.
– sho
Nov 9 at 23:10

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53233419%2fregex-to-parse-line-with-and-capture-string-and-comma-separated-number%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

This pattern captures the name by finding the dot sequence after the name.

Then captures a number and comma pattern as the number.

Then I loop to build the new array and replace comma with nothing.

$str = '       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29';

preg_match_all("/s*(.*?)s*.  ..*?([d,]+)/", $str, $matches);



foreach($matches[1] as $key => $name){

    $new = $name . "," . str_replace(",", "", $matches[2][$key]);

}





var_dump($new);

Output:

array(5) {

  [0]=>

  string(27) "John David James (DEM),7808"

  [1]=>

  string(26) "Marvin D. Scott (DEM),6548"

  [2]=>

  string(32) "Maria "Mary" Williams (DEM),4551"

  [3]=>

  string(22) "Dwayne R. Johnson,4322"

  [4]=>

  string(12) "WRITE-IN,188"

}

https://3v4l.org/SdqoZ

answered Nov 9 at 22:04

Andreas

14.4k31441

Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20

add a comment |

up vote
1
down vote

accepted

This pattern captures the name by finding the dot sequence after the name.

Then captures a number and comma pattern as the number.

Then I loop to build the new array and replace comma with nothing.

$str = '       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29';

preg_match_all("/s*(.*?)s*.  ..*?([d,]+)/", $str, $matches);



foreach($matches[1] as $key => $name){

    $new = $name . "," . str_replace(",", "", $matches[2][$key]);

}





var_dump($new);

Output:

array(5) {

  [0]=>

  string(27) "John David James (DEM),7808"

  [1]=>

  string(26) "Marvin D. Scott (DEM),6548"

  [2]=>

  string(32) "Maria "Mary" Williams (DEM),4551"

  [3]=>

  string(22) "Dwayne R. Johnson,4322"

  [4]=>

  string(12) "WRITE-IN,188"

}

https://3v4l.org/SdqoZ

answered Nov 9 at 22:04

Andreas

14.4k31441

Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20

add a comment |

up vote
1
down vote

accepted

This pattern captures the name by finding the dot sequence after the name.

Then captures a number and comma pattern as the number.

Then I loop to build the new array and replace comma with nothing.

$str = '       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29';

preg_match_all("/s*(.*?)s*.  ..*?([d,]+)/", $str, $matches);



foreach($matches[1] as $key => $name){

    $new = $name . "," . str_replace(",", "", $matches[2][$key]);

}





var_dump($new);

Output:

array(5) {

  [0]=>

  string(27) "John David James (DEM),7808"

  [1]=>

  string(26) "Marvin D. Scott (DEM),6548"

  [2]=>

  string(32) "Maria "Mary" Williams (DEM),4551"

  [3]=>

  string(22) "Dwayne R. Johnson,4322"

  [4]=>

  string(12) "WRITE-IN,188"

}

https://3v4l.org/SdqoZ

answered Nov 9 at 22:04

Andreas

14.4k31441

This pattern captures the name by finding the dot sequence after the name.

Then captures a number and comma pattern as the number.

Then I loop to build the new array and replace comma with nothing.

$str = '       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29';

preg_match_all("/s*(.*?)s*.  ..*?([d,]+)/", $str, $matches);



foreach($matches[1] as $key => $name){

    $new = $name . "," . str_replace(",", "", $matches[2][$key]);

}





var_dump($new);

Output:

array(5) {

  [0]=>

  string(27) "John David James (DEM),7808"

  [1]=>

  string(26) "Marvin D. Scott (DEM),6548"

  [2]=>

  string(32) "Maria "Mary" Williams (DEM),4551"

  [3]=>

  string(22) "Dwayne R. Johnson,4322"

  [4]=>

  string(12) "WRITE-IN,188"

}

https://3v4l.org/SdqoZ

answered Nov 9 at 22:04

Andreas

14.4k31441

answered Nov 9 at 22:04

Andreas

14.4k31441

answered Nov 9 at 22:04

Andreas

14.4k31441

answered Nov 9 at 22:04

Andreas

14.4k31441

Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20

add a comment |

Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20

Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
– sho
Nov 9 at 23:20

add a comment |

up vote
1
down vote

You can achieve it with an UNGREEDY regexp.

Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*.

Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.

We need to tell him to "eat" the less matchable part.

<?php

$lines = '

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

';

$lines = explode("n", $lines);



// Here, the U flag sets the ungreedy mode

$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';

echo "<pre>";

foreach ($lines  as $line) {

    // Here : - ${1} will capture the name,

    //        - ${2} the integer part of the number

    //        - ${3} the decimal part

    echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";

}

echo "</pre>";

?>

Result:

John David James (DEM),7808

Marvin D. Scott (DEM),6548

Maria "Mary" Williams (DEM),4551

Dwayne R. Johnson,4322

WRITE-IN,188

edited Nov 9 at 23:15

answered Nov 9 at 21:54

Amessihel

1,8201623

1

Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06

Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19

Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25

Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12

Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23

add a comment |

up vote
1
down vote

You can achieve it with an UNGREEDY regexp.

Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*.

Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.

We need to tell him to "eat" the less matchable part.

<?php

$lines = '

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

';

$lines = explode("n", $lines);



// Here, the U flag sets the ungreedy mode

$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';

echo "<pre>";

foreach ($lines  as $line) {

    // Here : - ${1} will capture the name,

    //        - ${2} the integer part of the number

    //        - ${3} the decimal part

    echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";

}

echo "</pre>";

?>

Result:

John David James (DEM),7808

Marvin D. Scott (DEM),6548

Maria "Mary" Williams (DEM),4551

Dwayne R. Johnson,4322

WRITE-IN,188

edited Nov 9 at 23:15

answered Nov 9 at 21:54

Amessihel

1,8201623

1

Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06

Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19

Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25

Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12

Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23

add a comment |

up vote
1
down vote

You can achieve it with an UNGREEDY regexp.

Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*.

Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.

We need to tell him to "eat" the less matchable part.

<?php

$lines = '

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

';

$lines = explode("n", $lines);



// Here, the U flag sets the ungreedy mode

$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';

echo "<pre>";

foreach ($lines  as $line) {

    // Here : - ${1} will capture the name,

    //        - ${2} the integer part of the number

    //        - ${3} the decimal part

    echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";

}

echo "</pre>";

?>

Result:

John David James (DEM),7808

Marvin D. Scott (DEM),6548

Maria "Mary" Williams (DEM),4551

Dwayne R. Johnson,4322

WRITE-IN,188

edited Nov 9 at 23:15

answered Nov 9 at 21:54

Amessihel

1,8201623

You can achieve it with an UNGREEDY regexp.

Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*.

Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.

We need to tell him to "eat" the less matchable part.

<?php

$lines = '

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

';

$lines = explode("n", $lines);



// Here, the U flag sets the ungreedy mode

$pattern = '/^s*(S.+S)[. ]+([0-9]+)(?:,([0-9]+))?s.*$/U';

echo "<pre>";

foreach ($lines  as $line) {

    // Here : - ${1} will capture the name,

    //        - ${2} the integer part of the number

    //        - ${3} the decimal part

    echo preg_replace($pattern, '${1},${2}${3}', $line) . "n";

}

echo "</pre>";

?>

Result:

John David James (DEM),7808

Marvin D. Scott (DEM),6548

Maria "Mary" Williams (DEM),4551

Dwayne R. Johnson,4322

WRITE-IN,188

edited Nov 9 at 23:15

answered Nov 9 at 21:54

Amessihel

1,8201623

edited Nov 9 at 23:15

answered Nov 9 at 21:54

Amessihel

1,8201623

answered Nov 9 at 21:54

Amessihel

1,8201623

answered Nov 9 at 21:54

Amessihel

1,8201623

1

Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06

Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19

Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25

Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12

Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23

add a comment |

1

Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06

Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19

Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25

Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12

Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23

Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
– Andreas
Nov 9 at 22:06

Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
– Amessihel
Nov 9 at 22:19

Just another heads up, OP does not want the comma in the number.
– Andreas
Nov 9 at 22:25

Thanks for the extremely detailed description!
– sho
Nov 9 at 23:12

Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
– sho
Nov 9 at 23:23

add a comment |

up vote
1
down vote

If the data is column aligned (all columns have known, fixed width) then use string functions such as substr:

<?php

$lines = '

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

';



foreach(preg_split('/(\r|\n)+/', $lines) as $line) {

    if ($line === '') continue;

    $name = substr($line, 0, 46);

    $amount = substr($line, 46, 10);

    $name = rtrim(ltrim($name), " .");

    $amount = (float) str_replace(",", "", $amount);

    echo $name . ", " . $amount;

}

edited Nov 9 at 23:45

answered Nov 9 at 21:36

Salman A

172k65329415

Thanks. This works well.
– sho
Nov 9 at 23:10

add a comment |

up vote
1
down vote

If the data is column aligned (all columns have known, fixed width) then use string functions such as substr:

<?php

$lines = '

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

';



foreach(preg_split('/(\r|\n)+/', $lines) as $line) {

    if ($line === '') continue;

    $name = substr($line, 0, 46);

    $amount = substr($line, 46, 10);

    $name = rtrim(ltrim($name), " .");

    $amount = (float) str_replace(",", "", $amount);

    echo $name . ", " . $amount;

}

edited Nov 9 at 23:45

answered Nov 9 at 21:36

Salman A

172k65329415

Thanks. This works well.
– sho
Nov 9 at 23:10

add a comment |

up vote
1
down vote

If the data is column aligned (all columns have known, fixed width) then use string functions such as substr:

<?php

$lines = '

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

';



foreach(preg_split('/(\r|\n)+/', $lines) as $line) {

    if ($line === '') continue;

    $name = substr($line, 0, 46);

    $amount = substr($line, 46, 10);

    $name = rtrim(ltrim($name), " .");

    $amount = (float) str_replace(",", "", $amount);

    echo $name . ", " . $amount;

}

edited Nov 9 at 23:45

answered Nov 9 at 21:36

Salman A

172k65329415

If the data is column aligned (all columns have known, fixed width) then use string functions such as substr:

<?php

$lines = '

       John David James (DEM) .  .  .  .  .  .     7,808   10.51

       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55

       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58

       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22

       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

';



foreach(preg_split('/(\r|\n)+/', $lines) as $line) {

    if ($line === '') continue;

    $name = substr($line, 0, 46);

    $amount = substr($line, 46, 10);

    $name = rtrim(ltrim($name), " .");

    $amount = (float) str_replace(",", "", $amount);

    echo $name . ", " . $amount;

}

edited Nov 9 at 23:45

answered Nov 9 at 21:36

Salman A

172k65329415

edited Nov 9 at 23:45

answered Nov 9 at 21:36

Salman A

172k65329415

answered Nov 9 at 21:36

Salman A

172k65329415

answered Nov 9 at 21:36

Salman A

172k65329415

Thanks. This works well.
– sho
Nov 9 at 23:10

add a comment |

Thanks. This works well.
– sho
Nov 9 at 23:10

Thanks. This works well.
– sho
Nov 9 at 23:10

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk