Replace junk character apostrophe using regex












1















All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.



The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s



[2 characters]+"20ac"+[1 character]


I need to replace this whole string with a single quote. Something like:



string.replaceAll(<regex>, "'");


It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.










share|improve this question

























  • the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?

    – Luis Colorado
    Nov 22 '18 at 5:53


















1















All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.



The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s



[2 characters]+"20ac"+[1 character]


I need to replace this whole string with a single quote. Something like:



string.replaceAll(<regex>, "'");


It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.










share|improve this question

























  • the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?

    – Luis Colorado
    Nov 22 '18 at 5:53
















1












1








1








All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.



The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s



[2 characters]+"20ac"+[1 character]


I need to replace this whole string with a single quote. Something like:



string.replaceAll(<regex>, "'");


It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.










share|improve this question
















All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.



The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s



[2 characters]+"20ac"+[1 character]


I need to replace this whole string with a single quote. Something like:



string.replaceAll(<regex>, "'");


It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.







java regex apostrophe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 '18 at 7:19









Oram

738317




738317










asked Nov 19 '18 at 3:23









Riju MahnaRiju Mahna

2,47684378




2,47684378













  • the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?

    – Luis Colorado
    Nov 22 '18 at 5:53





















  • the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?

    – Luis Colorado
    Nov 22 '18 at 5:53



















the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?

– Luis Colorado
Nov 22 '18 at 5:53







the unicode codepoint associated with u20ac is the EURO SIGN character. The best you can do is to change it for EUR sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java character sequence instead of the character, the browsers can display it properly. What are the two characters that precede the 20ac seq?

– Luis Colorado
Nov 22 '18 at 5:53














1 Answer
1






active

oldest

votes


















1














If you want any 2 characters followed by 20ac and then another character you can do something like this:



string.replaceAll("..(20ac).","'$1'");



The . means any character.
What's in the parenthesis will be captured and used later with $1.



Regex explanation



If you want to replace only junk characters you need to define them in the regex instead of the ..

Can be something like this: [㝵] (put all the junk characters inside the brackets).

For multiple characters you can use * for zero or more, + for one or more and {2} for exactly 2 characters.

So the end result can be something like this [㝵]+(20ac)?



Regex explanation






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53367892%2freplace-junk-character-apostrophe-using-regex%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    If you want any 2 characters followed by 20ac and then another character you can do something like this:



    string.replaceAll("..(20ac).","'$1'");



    The . means any character.
    What's in the parenthesis will be captured and used later with $1.



    Regex explanation



    If you want to replace only junk characters you need to define them in the regex instead of the ..

    Can be something like this: [㝵] (put all the junk characters inside the brackets).

    For multiple characters you can use * for zero or more, + for one or more and {2} for exactly 2 characters.

    So the end result can be something like this [㝵]+(20ac)?



    Regex explanation






    share|improve this answer






























      1














      If you want any 2 characters followed by 20ac and then another character you can do something like this:



      string.replaceAll("..(20ac).","'$1'");



      The . means any character.
      What's in the parenthesis will be captured and used later with $1.



      Regex explanation



      If you want to replace only junk characters you need to define them in the regex instead of the ..

      Can be something like this: [㝵] (put all the junk characters inside the brackets).

      For multiple characters you can use * for zero or more, + for one or more and {2} for exactly 2 characters.

      So the end result can be something like this [㝵]+(20ac)?



      Regex explanation






      share|improve this answer




























        1












        1








        1







        If you want any 2 characters followed by 20ac and then another character you can do something like this:



        string.replaceAll("..(20ac).","'$1'");



        The . means any character.
        What's in the parenthesis will be captured and used later with $1.



        Regex explanation



        If you want to replace only junk characters you need to define them in the regex instead of the ..

        Can be something like this: [㝵] (put all the junk characters inside the brackets).

        For multiple characters you can use * for zero or more, + for one or more and {2} for exactly 2 characters.

        So the end result can be something like this [㝵]+(20ac)?



        Regex explanation






        share|improve this answer















        If you want any 2 characters followed by 20ac and then another character you can do something like this:



        string.replaceAll("..(20ac).","'$1'");



        The . means any character.
        What's in the parenthesis will be captured and used later with $1.



        Regex explanation



        If you want to replace only junk characters you need to define them in the regex instead of the ..

        Can be something like this: [㝵] (put all the junk characters inside the brackets).

        For multiple characters you can use * for zero or more, + for one or more and {2} for exactly 2 characters.

        So the end result can be something like this [㝵]+(20ac)?



        Regex explanation







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 19 '18 at 7:23

























        answered Nov 19 '18 at 7:00









        OramOram

        738317




        738317






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53367892%2freplace-junk-character-apostrophe-using-regex%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Guess what letter conforming each word

            Run scheduled task as local user group (not BUILTIN)

            Port of Spain