How to simplify this regular expression to use in Google Analytics
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
Context: Google Analytics
Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.
As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:
https://sub.domain.com/path/folder/article?l=en >> expected https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id
The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.
If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2
The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.
I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*))
but the resulting subgroups are always different making the reference in the filter view a bit of a mess.
Is there anything you can think of?
regex google-analytics
|
show 1 more comment
Context: Google Analytics
Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.
As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:
https://sub.domain.com/path/folder/article?l=en >> expected https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id
The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.
If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2
The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.
I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*))
but the resulting subgroups are always different making the reference in the filter view a bit of a mess.
Is there anything you can think of?
regex google-analytics
Try^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?
, see regex101.com/r/fyGAJc/1
– Wiktor Stribiżew
Nov 21 '18 at 22:01
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 '18 at 6:06
The strange thing here is that the[/#]
doesn't seem to catch the/
. I tried to play around the permutations, but that doesn't make sense.
– Andrea Moro
Nov 22 '18 at 7:00
1
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 '18 at 7:59
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 '18 at 7:00
|
show 1 more comment
Context: Google Analytics
Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.
As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:
https://sub.domain.com/path/folder/article?l=en >> expected https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id
The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.
If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2
The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.
I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*))
but the resulting subgroups are always different making the reference in the filter view a bit of a mess.
Is there anything you can think of?
regex google-analytics
Context: Google Analytics
Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.
As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:
https://sub.domain.com/path/folder/article?l=en >> expected https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id
The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.
If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2
The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.
I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*))
but the resulting subgroups are always different making the reference in the filter view a bit of a mess.
Is there anything you can think of?
regex google-analytics
regex google-analytics
asked Nov 21 '18 at 21:54
Andrea MoroAndrea Moro
119416
119416
Try^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?
, see regex101.com/r/fyGAJc/1
– Wiktor Stribiżew
Nov 21 '18 at 22:01
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 '18 at 6:06
The strange thing here is that the[/#]
doesn't seem to catch the/
. I tried to play around the permutations, but that doesn't make sense.
– Andrea Moro
Nov 22 '18 at 7:00
1
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 '18 at 7:59
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 '18 at 7:00
|
show 1 more comment
Try^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?
, see regex101.com/r/fyGAJc/1
– Wiktor Stribiżew
Nov 21 '18 at 22:01
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 '18 at 6:06
The strange thing here is that the[/#]
doesn't seem to catch the/
. I tried to play around the permutations, but that doesn't make sense.
– Andrea Moro
Nov 22 '18 at 7:00
1
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 '18 at 7:59
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 '18 at 7:00
Try
^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?
, see regex101.com/r/fyGAJc/1– Wiktor Stribiżew
Nov 21 '18 at 22:01
Try
^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?
, see regex101.com/r/fyGAJc/1– Wiktor Stribiżew
Nov 21 '18 at 22:01
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 '18 at 6:06
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 '18 at 6:06
The strange thing here is that the
[/#]
doesn't seem to catch the /
. I tried to play around the permutations, but that doesn't make sense.– Andrea Moro
Nov 22 '18 at 7:00
The strange thing here is that the
[/#]
doesn't seem to catch the /
. I tried to play around the permutations, but that doesn't make sense.– Andrea Moro
Nov 22 '18 at 7:00
1
1
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 '18 at 7:59
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 '18 at 7:59
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 '18 at 7:00
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 '18 at 7:00
|
show 1 more comment
1 Answer
1
active
oldest
votes
You may use
^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$
See the regex demo.
Details
^
- start of string
([^#?]*?)
- Group 1: 0 or more chars other than#
and?
, as few as possible
([/?#]??.*|[/#]?#.*)?
- an optional Group 2: either of the two:
[/?#]??.*
- an optional/
,?
or#
followed with a?
char and then the rest of the string
|
- or
[/#]?#.*
- an optional/
or#
followed with a#
char and then the rest of the string
(/?)
- Group 3: an optional/
$
- end of string.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420990%2fhow-to-simplify-this-regular-expression-to-use-in-google-analytics%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You may use
^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$
See the regex demo.
Details
^
- start of string
([^#?]*?)
- Group 1: 0 or more chars other than#
and?
, as few as possible
([/?#]??.*|[/#]?#.*)?
- an optional Group 2: either of the two:
[/?#]??.*
- an optional/
,?
or#
followed with a?
char and then the rest of the string
|
- or
[/#]?#.*
- an optional/
or#
followed with a#
char and then the rest of the string
(/?)
- Group 3: an optional/
$
- end of string.
add a comment |
You may use
^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$
See the regex demo.
Details
^
- start of string
([^#?]*?)
- Group 1: 0 or more chars other than#
and?
, as few as possible
([/?#]??.*|[/#]?#.*)?
- an optional Group 2: either of the two:
[/?#]??.*
- an optional/
,?
or#
followed with a?
char and then the rest of the string
|
- or
[/#]?#.*
- an optional/
or#
followed with a#
char and then the rest of the string
(/?)
- Group 3: an optional/
$
- end of string.
add a comment |
You may use
^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$
See the regex demo.
Details
^
- start of string
([^#?]*?)
- Group 1: 0 or more chars other than#
and?
, as few as possible
([/?#]??.*|[/#]?#.*)?
- an optional Group 2: either of the two:
[/?#]??.*
- an optional/
,?
or#
followed with a?
char and then the rest of the string
|
- or
[/#]?#.*
- an optional/
or#
followed with a#
char and then the rest of the string
(/?)
- Group 3: an optional/
$
- end of string.
You may use
^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$
See the regex demo.
Details
^
- start of string
([^#?]*?)
- Group 1: 0 or more chars other than#
and?
, as few as possible
([/?#]??.*|[/#]?#.*)?
- an optional Group 2: either of the two:
[/?#]??.*
- an optional/
,?
or#
followed with a?
char and then the rest of the string
|
- or
[/#]?#.*
- an optional/
or#
followed with a#
char and then the rest of the string
(/?)
- Group 3: an optional/
$
- end of string.
answered Nov 23 '18 at 7:51
Wiktor StribiżewWiktor Stribiżew
328k16148228
328k16148228
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420990%2fhow-to-simplify-this-regular-expression-to-use-in-google-analytics%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Try
^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?
, see regex101.com/r/fyGAJc/1– Wiktor Stribiżew
Nov 21 '18 at 22:01
Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.
– Andrea Moro
Nov 22 '18 at 6:06
The strange thing here is that the
[/#]
doesn't seem to catch the/
. I tried to play around the permutations, but that doesn't make sense.– Andrea Moro
Nov 22 '18 at 7:00
1
Try regex101.com/r/fyGAJc/2
– Wiktor Stribiżew
Nov 22 '18 at 7:59
I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.
– Andrea Moro
Nov 23 '18 at 7:00