awk: Why are spaces delimiting, instead of FPAT regexp
I'm attempting to split strings delimited by ','
except where the ','
is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:
awk -v FPAT='([^,]+)|(([^))+))' '{
for (i=1; i<=NF; i++) {
printf("%sn", $i)
}
}' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
one
two
(1one),
three
four
(3three,
4four),
five
six,
seven
eight,
nine
ten
eleven
(8ten)
The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.
The output I want is:
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
regex awk
add a comment |
I'm attempting to split strings delimited by ','
except where the ','
is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:
awk -v FPAT='([^,]+)|(([^))+))' '{
for (i=1; i<=NF; i++) {
printf("%sn", $i)
}
}' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
one
two
(1one),
three
four
(3three,
4four),
five
six,
seven
eight,
nine
ten
eleven
(8ten)
The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.
The output I want is:
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
regex awk
add a comment |
I'm attempting to split strings delimited by ','
except where the ','
is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:
awk -v FPAT='([^,]+)|(([^))+))' '{
for (i=1; i<=NF; i++) {
printf("%sn", $i)
}
}' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
one
two
(1one),
three
four
(3three,
4four),
five
six,
seven
eight,
nine
ten
eleven
(8ten)
The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.
The output I want is:
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
regex awk
I'm attempting to split strings delimited by ','
except where the ','
is in a substring enclosed by brackets. Modifying other solutions here and examples in the docs I tried this test:
awk -v FPAT='([^,]+)|(([^))+))' '{
for (i=1; i<=NF; i++) {
printf("%sn", $i)
}
}' <<< 'one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
one
two
(1one),
three
four
(3three,
4four),
five
six,
seven
eight,
nine
ten
eleven
(8ten)
The FPAT isn't overriding the default delimiter as I expected. so clearly I'm missing something.
The output I want is:
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
regex awk
regex awk
edited Nov 13 at 7:37
Inian
38.5k63669
38.5k63669
asked Nov 13 at 7:17
dls49
255
255
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Using gnu grep
:
s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
If you don't have gnu grep
then you may use
grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"
Which will leave trailing spaces after comma.
For regex explanation see this demo.
Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
– dls49
Nov 13 at 8:01
1
That regex demo looks very useful. Thanks.
– dls49
Nov 14 at 7:20
add a comment |
Your code does not work because,
([^,]+)|(([^))+))
is an invalid regex, it has an unmatched[
in it,- You say you're using mawk, but it doesn't support FPAT.
Here is the FPAT solution I've come up with
$ cat file
one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
$
$ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' '{ for (i=1; i<=NF; ++i) print $i }' file
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
Explanation of FPAT
variable:
[^,(]*
matches any number of non-comma, non-parenthesis chars,
\([^)]*\)
matches any number of non-parenthesis chars surrounded by parentheses,
- Putting this in
(...)?
makes this match optional.
- Putting this in
(, |$)
means matched field should end with a comma followed by a space, or it should be the last field in the line.
And here is how to do it in mawk
mawk '{ gsub(/[^,(]*(([^)]*))?, /, "&n") }1' file
sed could be used as well for this particular case
sed 's/[^,(]*(([^)]*))?, /&n/g' file
This does the same as my original output on my system (mawk 1.3.3). What version are you on?
– dls49
Nov 13 at 7:53
gawk 4.2.1, I'm gonna check out mawk now
– oguzismail
Nov 13 at 7:55
@dls49 updated my answer, check it out.
– oguzismail
Nov 13 at 8:04
1
Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
– dls49
Nov 13 at 8:32
Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
– oguzismail
Nov 13 at 8:34
|
show 2 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275747%2fawk-why-are-spaces-delimiting-instead-of-fpat-regexp%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using gnu grep
:
s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
If you don't have gnu grep
then you may use
grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"
Which will leave trailing spaces after comma.
For regex explanation see this demo.
Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
– dls49
Nov 13 at 8:01
1
That regex demo looks very useful. Thanks.
– dls49
Nov 14 at 7:20
add a comment |
Using gnu grep
:
s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
If you don't have gnu grep
then you may use
grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"
Which will leave trailing spaces after comma.
For regex explanation see this demo.
Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
– dls49
Nov 13 at 8:01
1
That regex demo looks very useful. Thanks.
– dls49
Nov 14 at 7:20
add a comment |
Using gnu grep
:
s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
If you don't have gnu grep
then you may use
grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"
Which will leave trailing spaces after comma.
For regex explanation see this demo.
Using gnu grep
:
s='one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)'
grep -oP 's*K([^,(]*([^)]*))*[^,]*(,|$)' <<< "$s"
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
If you don't have gnu grep
then you may use
grep -oE '([^,(]*([^)]*))*[^,]*(,s*|$)' <<< "$s"
Which will leave trailing spaces after comma.
For regex explanation see this demo.
edited Nov 13 at 7:46
answered Nov 13 at 7:41
anubhava
519k46314388
519k46314388
Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
– dls49
Nov 13 at 8:01
1
That regex demo looks very useful. Thanks.
– dls49
Nov 14 at 7:20
add a comment |
Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
– dls49
Nov 13 at 8:01
1
That regex demo looks very useful. Thanks.
– dls49
Nov 14 at 7:20
Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
– dls49
Nov 13 at 8:01
Thanks, this gives me a usable work around. However I'd also like to know how I could have used awk.
– dls49
Nov 13 at 8:01
1
1
That regex demo looks very useful. Thanks.
– dls49
Nov 14 at 7:20
That regex demo looks very useful. Thanks.
– dls49
Nov 14 at 7:20
add a comment |
Your code does not work because,
([^,]+)|(([^))+))
is an invalid regex, it has an unmatched[
in it,- You say you're using mawk, but it doesn't support FPAT.
Here is the FPAT solution I've come up with
$ cat file
one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
$
$ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' '{ for (i=1; i<=NF; ++i) print $i }' file
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
Explanation of FPAT
variable:
[^,(]*
matches any number of non-comma, non-parenthesis chars,
\([^)]*\)
matches any number of non-parenthesis chars surrounded by parentheses,
- Putting this in
(...)?
makes this match optional.
- Putting this in
(, |$)
means matched field should end with a comma followed by a space, or it should be the last field in the line.
And here is how to do it in mawk
mawk '{ gsub(/[^,(]*(([^)]*))?, /, "&n") }1' file
sed could be used as well for this particular case
sed 's/[^,(]*(([^)]*))?, /&n/g' file
This does the same as my original output on my system (mawk 1.3.3). What version are you on?
– dls49
Nov 13 at 7:53
gawk 4.2.1, I'm gonna check out mawk now
– oguzismail
Nov 13 at 7:55
@dls49 updated my answer, check it out.
– oguzismail
Nov 13 at 8:04
1
Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
– dls49
Nov 13 at 8:32
Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
– oguzismail
Nov 13 at 8:34
|
show 2 more comments
Your code does not work because,
([^,]+)|(([^))+))
is an invalid regex, it has an unmatched[
in it,- You say you're using mawk, but it doesn't support FPAT.
Here is the FPAT solution I've come up with
$ cat file
one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
$
$ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' '{ for (i=1; i<=NF; ++i) print $i }' file
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
Explanation of FPAT
variable:
[^,(]*
matches any number of non-comma, non-parenthesis chars,
\([^)]*\)
matches any number of non-parenthesis chars surrounded by parentheses,
- Putting this in
(...)?
makes this match optional.
- Putting this in
(, |$)
means matched field should end with a comma followed by a space, or it should be the last field in the line.
And here is how to do it in mawk
mawk '{ gsub(/[^,(]*(([^)]*))?, /, "&n") }1' file
sed could be used as well for this particular case
sed 's/[^,(]*(([^)]*))?, /&n/g' file
This does the same as my original output on my system (mawk 1.3.3). What version are you on?
– dls49
Nov 13 at 7:53
gawk 4.2.1, I'm gonna check out mawk now
– oguzismail
Nov 13 at 7:55
@dls49 updated my answer, check it out.
– oguzismail
Nov 13 at 8:04
1
Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
– dls49
Nov 13 at 8:32
Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
– oguzismail
Nov 13 at 8:34
|
show 2 more comments
Your code does not work because,
([^,]+)|(([^))+))
is an invalid regex, it has an unmatched[
in it,- You say you're using mawk, but it doesn't support FPAT.
Here is the FPAT solution I've come up with
$ cat file
one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
$
$ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' '{ for (i=1; i<=NF; ++i) print $i }' file
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
Explanation of FPAT
variable:
[^,(]*
matches any number of non-comma, non-parenthesis chars,
\([^)]*\)
matches any number of non-parenthesis chars surrounded by parentheses,
- Putting this in
(...)?
makes this match optional.
- Putting this in
(, |$)
means matched field should end with a comma followed by a space, or it should be the last field in the line.
And here is how to do it in mawk
mawk '{ gsub(/[^,(]*(([^)]*))?, /, "&n") }1' file
sed could be used as well for this particular case
sed 's/[^,(]*(([^)]*))?, /&n/g' file
Your code does not work because,
([^,]+)|(([^))+))
is an invalid regex, it has an unmatched[
in it,- You say you're using mawk, but it doesn't support FPAT.
Here is the FPAT solution I've come up with
$ cat file
one two (1one), three four (3three, 4four), five six, seven eight, nine ten eleven (8ten)
$
$ awk -v FPAT='[^,(]*(\([^)]*\))?(, |$)' '{ for (i=1; i<=NF; ++i) print $i }' file
one two (1one),
three four (3three, 4four),
five six,
seven eight,
nine ten eleven (8ten)
Explanation of FPAT
variable:
[^,(]*
matches any number of non-comma, non-parenthesis chars,
\([^)]*\)
matches any number of non-parenthesis chars surrounded by parentheses,
- Putting this in
(...)?
makes this match optional.
- Putting this in
(, |$)
means matched field should end with a comma followed by a space, or it should be the last field in the line.
And here is how to do it in mawk
mawk '{ gsub(/[^,(]*(([^)]*))?, /, "&n") }1' file
sed could be used as well for this particular case
sed 's/[^,(]*(([^)]*))?, /&n/g' file
edited Nov 13 at 8:27
answered Nov 13 at 7:40
oguzismail
3,20631025
3,20631025
This does the same as my original output on my system (mawk 1.3.3). What version are you on?
– dls49
Nov 13 at 7:53
gawk 4.2.1, I'm gonna check out mawk now
– oguzismail
Nov 13 at 7:55
@dls49 updated my answer, check it out.
– oguzismail
Nov 13 at 8:04
1
Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
– dls49
Nov 13 at 8:32
Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
– oguzismail
Nov 13 at 8:34
|
show 2 more comments
This does the same as my original output on my system (mawk 1.3.3). What version are you on?
– dls49
Nov 13 at 7:53
gawk 4.2.1, I'm gonna check out mawk now
– oguzismail
Nov 13 at 7:55
@dls49 updated my answer, check it out.
– oguzismail
Nov 13 at 8:04
1
Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
– dls49
Nov 13 at 8:32
Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
– oguzismail
Nov 13 at 8:34
This does the same as my original output on my system (mawk 1.3.3). What version are you on?
– dls49
Nov 13 at 7:53
This does the same as my original output on my system (mawk 1.3.3). What version are you on?
– dls49
Nov 13 at 7:53
gawk 4.2.1, I'm gonna check out mawk now
– oguzismail
Nov 13 at 7:55
gawk 4.2.1, I'm gonna check out mawk now
– oguzismail
Nov 13 at 7:55
@dls49 updated my answer, check it out.
– oguzismail
Nov 13 at 8:04
@dls49 updated my answer, check it out.
– oguzismail
Nov 13 at 8:04
1
1
Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
– dls49
Nov 13 at 8:32
Oh! mawk doesn't support FPAT ok - that explains why even fixing my regex didn't work. Thank you! Your sed solution gives the required result, but not your mawk solution. The latter is also splitting when the comma occurs inside brackets.
– dls49
Nov 13 at 8:32
Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
– oguzismail
Nov 13 at 8:34
Yep, you're welcome. I've noticed it and fixed 5 mins ago, it still doesn't work??
– oguzismail
Nov 13 at 8:34
|
show 2 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53275747%2fawk-why-are-spaces-delimiting-instead-of-fpat-regexp%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown