awk: Why are spaces delimiting, instead of FPAT regexp












3














I'm attempting to split strings delimited by ',' except where the ',' is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:



awk -v FPAT='([^,]+)|(([^))+))' '{
for (i=1; i<=NF; i++) {
printf("%sn", $i)
}
}' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
one
two
(1one),
three
four
(3three,
4four),
five
six,
seven
eight,
nine
ten
eleven
(8ten)


The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.



The output I want is:



one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)









share|improve this question





























    3














    I'm attempting to split strings delimited by ',' except where the ',' is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:



    awk -v FPAT='([^,]+)|(([^))+))' '{
    for (i=1; i<=NF; i++) {
    printf("%sn", $i)
    }
    }' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
    one
    two
    (1one),
    three
    four
    (3three,
    4four),
    five
    six,
    seven
    eight,
    nine
    ten
    eleven
    (8ten)


    The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.



    The output I want is:



    one two (1one),
    three four (3three, 4four),
    five six,
    seven eight,
    nine ten eleven (8ten)









    share|improve this question



























      3












      3








      3


      1





      I'm attempting to split strings delimited by ',' except where the ',' is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:



      awk -v FPAT='([^,]+)|(([^))+))' '{
      for (i=1; i<=NF; i++) {
      printf("%sn", $i)
      }
      }' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
      one
      two
      (1one),
      three
      four
      (3three,
      4four),
      five
      six,
      seven
      eight,
      nine
      ten
      eleven
      (8ten)


      The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.



      The output I want is:



      one two (1one),
      three four (3three, 4four),
      five six,
      seven eight,
      nine ten eleven (8ten)









      share|improve this question















      I'm attempting to split strings delimited by ',' except where the ',' is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:



      awk -v FPAT='([^,]+)|(([^))+))' '{
      for (i=1; i<=NF; i++) {
      printf("%sn", $i)
      }
      }' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
      one
      two
      (1one),
      three
      four
      (3three,
      4four),
      five
      six,
      seven
      eight,
      nine
      ten
      eleven
      (8ten)


      The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.



      The output I want is:



      one two (1one),
      three four (3three, 4four),
      five six,
      seven eight,
      nine ten eleven (8ten)






      regex awk






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 13 at 7:37









      Inian

      38.5k63669




      38.5k63669










      asked Nov 13 at 7:17









      dls49

      255




      255
























          2 Answers
          2






          active

          oldest

          votes


















          3














          Using gnu grep:



          s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
          grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"




          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          If you don't have gnu grep then you may use



          grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"


          Which will leave trailing spaces after comma.



          For regex explanation see this demo.






          share|improve this answer























          • Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
            – dls49
            Nov 13 at 8:01






          • 1




            That regex demo looks very useful. Thanks.
            – dls49
            Nov 14 at 7:20



















          2














          Your code does not work because,





          1. ([^,]+)|(([^))+)) is an invalid regex, it has an unmatched [ in it,

          2. You say you're using mawk, but it doesn't support FPAT.




          Here is the FPAT solution I've come up with



          $ cat file
          one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
          $
          $ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' '{ for (i=1; i<=NF; ++i) print $i }' file
          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          Explanation of FPAT variable:





          • [^,(]* matches any number of non-comma, non-parenthesis chars,


          • \([^)]*\) matches any number of non-parenthesis chars surrounded by parentheses,


            • Putting this in (...)? makes this match optional.




          • (, |$) means matched field should end with a comma followed by a space, or it should be the last field in the line.




          And here is how to do it in mawk



          mawk '{ gsub(/[^,(]*(([^)]*))?, /, "&n") }1'  file




          sed could be used as well for this particular case



          sed 's/[^,(]*(([^)]*))?, /&n/g'  file





          share|improve this answer























          • This does the same as my original output on my system (mawk 1.3.3). What version are you on?
            – dls49
            Nov 13 at 7:53










          • gawk 4.2.1, I'm gonna check out mawk now
            – oguzismail
            Nov 13 at 7:55










          • @dls49 updated my answer, check it out.
            – oguzismail
            Nov 13 at 8:04






          • 1




            Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
            – dls49
            Nov 13 at 8:32










          • Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
            – oguzismail
            Nov 13 at 8:34











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275747%2fawk-why-are-spaces-delimiting-instead-of-fpat-regexp%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          Using gnu grep:



          s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
          grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"




          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          If you don't have gnu grep then you may use



          grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"


          Which will leave trailing spaces after comma.



          For regex explanation see this demo.






          share|improve this answer























          • Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
            – dls49
            Nov 13 at 8:01






          • 1




            That regex demo looks very useful. Thanks.
            – dls49
            Nov 14 at 7:20
















          3














          Using gnu grep:



          s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
          grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"




          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          If you don't have gnu grep then you may use



          grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"


          Which will leave trailing spaces after comma.



          For regex explanation see this demo.






          share|improve this answer























          • Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
            – dls49
            Nov 13 at 8:01






          • 1




            That regex demo looks very useful. Thanks.
            – dls49
            Nov 14 at 7:20














          3












          3








          3






          Using gnu grep:



          s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
          grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"




          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          If you don't have gnu grep then you may use



          grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"


          Which will leave trailing spaces after comma.



          For regex explanation see this demo.






          share|improve this answer














          Using gnu grep:



          s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
          grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"




          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          If you don't have gnu grep then you may use



          grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"


          Which will leave trailing spaces after comma.



          For regex explanation see this demo.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 13 at 7:46

























          answered Nov 13 at 7:41









          anubhava

          519k46314388




          519k46314388












          • Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
            – dls49
            Nov 13 at 8:01






          • 1




            That regex demo looks very useful. Thanks.
            – dls49
            Nov 14 at 7:20


















          • Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
            – dls49
            Nov 13 at 8:01






          • 1




            That regex demo looks very useful. Thanks.
            – dls49
            Nov 14 at 7:20
















          Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
          – dls49
          Nov 13 at 8:01




          Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
          – dls49
          Nov 13 at 8:01




          1




          1




          That regex demo looks very useful. Thanks.
          – dls49
          Nov 14 at 7:20




          That regex demo looks very useful. Thanks.
          – dls49
          Nov 14 at 7:20













          2














          Your code does not work because,





          1. ([^,]+)|(([^))+)) is an invalid regex, it has an unmatched [ in it,

          2. You say you're using mawk, but it doesn't support FPAT.




          Here is the FPAT solution I've come up with



          $ cat file
          one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
          $
          $ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' '{ for (i=1; i<=NF; ++i) print $i }' file
          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          Explanation of FPAT variable:





          • [^,(]* matches any number of non-comma, non-parenthesis chars,


          • \([^)]*\) matches any number of non-parenthesis chars surrounded by parentheses,


            • Putting this in (...)? makes this match optional.




          • (, |$) means matched field should end with a comma followed by a space, or it should be the last field in the line.




          And here is how to do it in mawk



          mawk '{ gsub(/[^,(]*(([^)]*))?, /, "&n") }1'  file




          sed could be used as well for this particular case



          sed 's/[^,(]*(([^)]*))?, /&n/g'  file





          share|improve this answer























          • This does the same as my original output on my system (mawk 1.3.3). What version are you on?
            – dls49
            Nov 13 at 7:53










          • gawk 4.2.1, I'm gonna check out mawk now
            – oguzismail
            Nov 13 at 7:55










          • @dls49 updated my answer, check it out.
            – oguzismail
            Nov 13 at 8:04






          • 1




            Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
            – dls49
            Nov 13 at 8:32










          • Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
            – oguzismail
            Nov 13 at 8:34
















          2














          Your code does not work because,





          1. ([^,]+)|(([^))+)) is an invalid regex, it has an unmatched [ in it,

          2. You say you're using mawk, but it doesn't support FPAT.




          Here is the FPAT solution I've come up with



          $ cat file
          one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
          $
          $ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' '{ for (i=1; i<=NF; ++i) print $i }' file
          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          Explanation of FPAT variable:





          • [^,(]* matches any number of non-comma, non-parenthesis chars,


          • \([^)]*\) matches any number of non-parenthesis chars surrounded by parentheses,


            • Putting this in (...)? makes this match optional.




          • (, |$) means matched field should end with a comma followed by a space, or it should be the last field in the line.




          And here is how to do it in mawk



          mawk '{ gsub(/[^,(]*(([^)]*))?, /, "&n") }1'  file




          sed could be used as well for this particular case



          sed 's/[^,(]*(([^)]*))?, /&n/g'  file





          share|improve this answer























          • This does the same as my original output on my system (mawk 1.3.3). What version are you on?
            – dls49
            Nov 13 at 7:53










          • gawk 4.2.1, I'm gonna check out mawk now
            – oguzismail
            Nov 13 at 7:55










          • @dls49 updated my answer, check it out.
            – oguzismail
            Nov 13 at 8:04






          • 1




            Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
            – dls49
            Nov 13 at 8:32










          • Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
            – oguzismail
            Nov 13 at 8:34














          2












          2








          2






          Your code does not work because,





          1. ([^,]+)|(([^))+)) is an invalid regex, it has an unmatched [ in it,

          2. You say you're using mawk, but it doesn't support FPAT.




          Here is the FPAT solution I've come up with



          $ cat file
          one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
          $
          $ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' '{ for (i=1; i<=NF; ++i) print $i }' file
          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          Explanation of FPAT variable:





          • [^,(]* matches any number of non-comma, non-parenthesis chars,


          • \([^)]*\) matches any number of non-parenthesis chars surrounded by parentheses,


            • Putting this in (...)? makes this match optional.




          • (, |$) means matched field should end with a comma followed by a space, or it should be the last field in the line.




          And here is how to do it in mawk



          mawk '{ gsub(/[^,(]*(([^)]*))?, /, "&n") }1'  file




          sed could be used as well for this particular case



          sed 's/[^,(]*(([^)]*))?, /&n/g'  file





          share|improve this answer














          Your code does not work because,





          1. ([^,]+)|(([^))+)) is an invalid regex, it has an unmatched [ in it,

          2. You say you're using mawk, but it doesn't support FPAT.




          Here is the FPAT solution I've come up with



          $ cat file
          one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
          $
          $ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' '{ for (i=1; i<=NF; ++i) print $i }' file
          one two (1one),
          three four (3three, 4four),
          five six,
          seven eight,
          nine ten eleven (8ten)


          Explanation of FPAT variable:





          • [^,(]* matches any number of non-comma, non-parenthesis chars,


          • \([^)]*\) matches any number of non-parenthesis chars surrounded by parentheses,


            • Putting this in (...)? makes this match optional.




          • (, |$) means matched field should end with a comma followed by a space, or it should be the last field in the line.




          And here is how to do it in mawk



          mawk '{ gsub(/[^,(]*(([^)]*))?, /, "&n") }1'  file




          sed could be used as well for this particular case



          sed 's/[^,(]*(([^)]*))?, /&n/g'  file






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 13 at 8:27

























          answered Nov 13 at 7:40









          oguzismail

          3,20631025




          3,20631025












          • This does the same as my original output on my system (mawk 1.3.3). What version are you on?
            – dls49
            Nov 13 at 7:53










          • gawk 4.2.1, I'm gonna check out mawk now
            – oguzismail
            Nov 13 at 7:55










          • @dls49 updated my answer, check it out.
            – oguzismail
            Nov 13 at 8:04






          • 1




            Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
            – dls49
            Nov 13 at 8:32










          • Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
            – oguzismail
            Nov 13 at 8:34


















          • This does the same as my original output on my system (mawk 1.3.3). What version are you on?
            – dls49
            Nov 13 at 7:53










          • gawk 4.2.1, I'm gonna check out mawk now
            – oguzismail
            Nov 13 at 7:55










          • @dls49 updated my answer, check it out.
            – oguzismail
            Nov 13 at 8:04






          • 1




            Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
            – dls49
            Nov 13 at 8:32










          • Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
            – oguzismail
            Nov 13 at 8:34
















          This does the same as my original output on my system (mawk 1.3.3). What version are you on?
          – dls49
          Nov 13 at 7:53




          This does the same as my original output on my system (mawk 1.3.3). What version are you on?
          – dls49
          Nov 13 at 7:53












          gawk 4.2.1, I'm gonna check out mawk now
          – oguzismail
          Nov 13 at 7:55




          gawk 4.2.1, I'm gonna check out mawk now
          – oguzismail
          Nov 13 at 7:55












          @dls49 updated my answer, check it out.
          – oguzismail
          Nov 13 at 8:04




          @dls49 updated my answer, check it out.
          – oguzismail
          Nov 13 at 8:04




          1




          1




          Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
          – dls49
          Nov 13 at 8:32




          Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
          – dls49
          Nov 13 at 8:32












          Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
          – oguzismail
          Nov 13 at 8:34




          Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
          – oguzismail
          Nov 13 at 8:34


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275747%2fawk-why-are-spaces-delimiting-instead-of-fpat-regexp%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Guess what letter conforming each word

          Run scheduled task as local user group (not BUILTIN)

          Port of Spain