How to simplify this regular expression to use in Google Analytics





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















Context: Google Analytics



Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.



As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:



https://sub.domain.com/path/folder/article?l=en >> expected     https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id


The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.



If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2



The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.



I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.



Is there anything you can think of?










share|improve this question























  • Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1

    – Wiktor Stribiżew
    Nov 21 '18 at 22:01











  • Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.

    – Andrea Moro
    Nov 22 '18 at 6:06











  • The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.

    – Andrea Moro
    Nov 22 '18 at 7:00






  • 1





    Try regex101.com/r/fyGAJc/2

    – Wiktor Stribiżew
    Nov 22 '18 at 7:59











  • I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.

    – Andrea Moro
    Nov 23 '18 at 7:00


















1















Context: Google Analytics



Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.



As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:



https://sub.domain.com/path/folder/article?l=en >> expected     https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id


The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.



If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2



The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.



I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.



Is there anything you can think of?










share|improve this question























  • Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1

    – Wiktor Stribiżew
    Nov 21 '18 at 22:01











  • Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.

    – Andrea Moro
    Nov 22 '18 at 6:06











  • The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.

    – Andrea Moro
    Nov 22 '18 at 7:00






  • 1





    Try regex101.com/r/fyGAJc/2

    – Wiktor Stribiżew
    Nov 22 '18 at 7:59











  • I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.

    – Andrea Moro
    Nov 23 '18 at 7:00














1












1








1








Context: Google Analytics



Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.



As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:



https://sub.domain.com/path/folder/article?l=en >> expected     https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id


The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.



If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2



The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.



I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.



Is there anything you can think of?










share|improve this question














Context: Google Analytics



Need: A filter that takes given a URI or a URN (yes a URN) it returns everything up to the querystring excluded.



As you can imagine there are multiples variations out of there, which I hope I have covered in full with the list below:



https://sub.domain.com/path/folder/article?l=en >> expected     https://sub.domain.com/path/folder/article
https://sub.domain.com/path/folder/103#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103?#3173l=en >> expected https://sub.domain.com/path/folder/103
https://sub.domain.com/path/folder/103#?3173l=en
0sub.domain.tld >> expected sub.domain.tld
sub.domain.tld/ >> expected sub.domain.tld
sub.domain.tld?param=value >> expected sub.domain.tld
sub.domain.tld/?param=value >> expected sub.domain.tld
sub.domain.tld?param=value#id >> expected sub.domain.tld
sub.domain.tld/?param=value#id >> expected sub.domain.tld
sub.domain.tld/folder >> expected sub.domain.tld/folder
sub.domain.tld/folder/ >> expected sub.domain.tld/folder
sub.domain.tld/folder?param=value >> expected sub.domain.tld/folder
sub.domain.tld/folder/?param=value >> expected sub.domain.tld/folder
sub.domain.tld/1/folder >> expected sub.domain.tld/1/folder
sub.domain.tld/1/folder/ >> expected sub.domain.tld/1/folder
2sub.domain.tld/1/folder?param=value
3sub.domain.tld/1/folder/?param=value
4sub.domain.tld#id
5sub.domain.tld/#id
6sub.domain.tld/1#id
7sub.domain.tld/1/#id


The challenge I cannot solve is obtaining a regular expression that matches things in a subgroup that is always the same.



If you have to play around, I have saved a couple of tests in
- https://regex101.com/r/trZl06/1/
- https://regex101.com/r/SetgFn/2



The latter is quite satisfactory at capturing my cases, but as soon as a capturing-group is added in front of the existing matching condition, the group greps even words that are not expected.



I tried also something like ((.*)(?:[/]?.*)|(.*)(?:?.*))|((.*)/$|(.*)) but the resulting subgroups are always different making the reference in the filter view a bit of a mess.



Is there anything you can think of?







regex google-analytics






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 21 '18 at 21:54









Andrea MoroAndrea Moro

119416




119416













  • Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1

    – Wiktor Stribiżew
    Nov 21 '18 at 22:01











  • Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.

    – Andrea Moro
    Nov 22 '18 at 6:06











  • The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.

    – Andrea Moro
    Nov 22 '18 at 7:00






  • 1





    Try regex101.com/r/fyGAJc/2

    – Wiktor Stribiżew
    Nov 22 '18 at 7:59











  • I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.

    – Andrea Moro
    Nov 23 '18 at 7:00



















  • Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1

    – Wiktor Stribiżew
    Nov 21 '18 at 22:01











  • Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.

    – Andrea Moro
    Nov 22 '18 at 6:06











  • The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.

    – Andrea Moro
    Nov 22 '18 at 7:00






  • 1





    Try regex101.com/r/fyGAJc/2

    – Wiktor Stribiżew
    Nov 22 '18 at 7:59











  • I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.

    – Andrea Moro
    Nov 23 '18 at 7:00

















Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1

– Wiktor Stribiżew
Nov 21 '18 at 22:01





Try ^([^#?]*)([/?#]??.*|/$|[/#]#.*|#.*)?, see regex101.com/r/fyGAJc/1

– Wiktor Stribiżew
Nov 21 '18 at 22:01













Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.

– Andrea Moro
Nov 22 '18 at 6:06





Thanks Wiktor. That's on the good way. The last bit missing is to group the trailing slash - when available - into the next group so to avoid GA traffic dispersion on pages that may be virtually the same. Unfortunately I can't implement server-side rules to solve this.

– Andrea Moro
Nov 22 '18 at 6:06













The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.

– Andrea Moro
Nov 22 '18 at 7:00





The strange thing here is that the [/#] doesn't seem to catch the /. I tried to play around the permutations, but that doesn't make sense.

– Andrea Moro
Nov 22 '18 at 7:00




1




1





Try regex101.com/r/fyGAJc/2

– Wiktor Stribiżew
Nov 22 '18 at 7:59





Try regex101.com/r/fyGAJc/2

– Wiktor Stribiżew
Nov 22 '18 at 7:59













I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.

– Andrea Moro
Nov 23 '18 at 7:00





I have eventually solved with a second filter in GA, stripping the last slash, but having everything in one go it is ultimately better. Thanks. I will compare the changes to understand my mistakes.

– Andrea Moro
Nov 23 '18 at 7:00












1 Answer
1






active

oldest

votes


















0














You may use



^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$


See the regex demo.



Details





  • ^ - start of string


  • ([^#?]*?) - Group 1: 0 or more chars other than # and ?, as few as possible


  • ([/?#]??.*|[/#]?#.*)? - an optional Group 2: either of the two:



    • [/?#]??.* - an optional /, ? or # followed with a ? char and then the rest of the string


    • | - or


    • [/#]?#.* - an optional / or # followed with a # char and then the rest of the string




  • (/?) - Group 3: an optional /


  • $ - end of string.






share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420990%2fhow-to-simplify-this-regular-expression-to-use-in-google-analytics%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    You may use



    ^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$


    See the regex demo.



    Details





    • ^ - start of string


    • ([^#?]*?) - Group 1: 0 or more chars other than # and ?, as few as possible


    • ([/?#]??.*|[/#]?#.*)? - an optional Group 2: either of the two:



      • [/?#]??.* - an optional /, ? or # followed with a ? char and then the rest of the string


      • | - or


      • [/#]?#.* - an optional / or # followed with a # char and then the rest of the string




    • (/?) - Group 3: an optional /


    • $ - end of string.






    share|improve this answer




























      0














      You may use



      ^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$


      See the regex demo.



      Details





      • ^ - start of string


      • ([^#?]*?) - Group 1: 0 or more chars other than # and ?, as few as possible


      • ([/?#]??.*|[/#]?#.*)? - an optional Group 2: either of the two:



        • [/?#]??.* - an optional /, ? or # followed with a ? char and then the rest of the string


        • | - or


        • [/#]?#.* - an optional / or # followed with a # char and then the rest of the string




      • (/?) - Group 3: an optional /


      • $ - end of string.






      share|improve this answer


























        0












        0








        0







        You may use



        ^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$


        See the regex demo.



        Details





        • ^ - start of string


        • ([^#?]*?) - Group 1: 0 or more chars other than # and ?, as few as possible


        • ([/?#]??.*|[/#]?#.*)? - an optional Group 2: either of the two:



          • [/?#]??.* - an optional /, ? or # followed with a ? char and then the rest of the string


          • | - or


          • [/#]?#.* - an optional / or # followed with a # char and then the rest of the string




        • (/?) - Group 3: an optional /


        • $ - end of string.






        share|improve this answer













        You may use



        ^([^#?]*?)([/?#]??.*|[/#]?#.*)?(/?)$


        See the regex demo.



        Details





        • ^ - start of string


        • ([^#?]*?) - Group 1: 0 or more chars other than # and ?, as few as possible


        • ([/?#]??.*|[/#]?#.*)? - an optional Group 2: either of the two:



          • [/?#]??.* - an optional /, ? or # followed with a ? char and then the rest of the string


          • | - or


          • [/#]?#.* - an optional / or # followed with a # char and then the rest of the string




        • (/?) - Group 3: an optional /


        • $ - end of string.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 23 '18 at 7:51









        Wiktor StribiżewWiktor Stribiżew

        328k16148228




        328k16148228
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420990%2fhow-to-simplify-this-regular-expression-to-use-in-google-analytics%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Guess what letter conforming each word

            Port of Spain

            Run scheduled task as local user group (not BUILTIN)