Replace junk character apostrophe using regex
All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.
The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s
[2 characters]+"20ac"+[1 character]
I need to replace this whole string with a single quote. Something like:
string.replaceAll(<regex>, "'");
It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.
java regex apostrophe
add a comment |
All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.
The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s
[2 characters]+"20ac"+[1 character]
I need to replace this whole string with a single quote. Something like:
string.replaceAll(<regex>, "'");
It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.
java regex apostrophe
the unicode codepoint associated withu20ac
is the EURO SIGN character. The best you can do is to change it forEUR
sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java€
character sequence instead of the€
character, the browsers can display it properly. What are the two characters that precede the20ac
seq?
– Luis Colorado
Nov 22 '18 at 5:53
add a comment |
All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.
The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s
[2 characters]+"20ac"+[1 character]
I need to replace this whole string with a single quote. Something like:
string.replaceAll(<regex>, "'");
It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.
java regex apostrophe
All the apostrophe in my HTML is being converted to junk by the UI engine. I need to create a regex with below pattern to replace the string in Java.
The specific pattern is needed because the some characters are displayed as junk from the HTML. The whole string can be like : company㝵20ac?s
[2 characters]+"20ac"+[1 character]
I need to replace this whole string with a single quote. Something like:
string.replaceAll(<regex>, "'");
It should not be like this but the junk characters cannot be parsed by any java or HTML anymore once saved in the database.
java regex apostrophe
java regex apostrophe
edited Nov 19 '18 at 7:19
Oram
738317
738317
asked Nov 19 '18 at 3:23
Riju MahnaRiju Mahna
2,47684378
2,47684378
the unicode codepoint associated withu20ac
is the EURO SIGN character. The best you can do is to change it forEUR
sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java€
character sequence instead of the€
character, the browsers can display it properly. What are the two characters that precede the20ac
seq?
– Luis Colorado
Nov 22 '18 at 5:53
add a comment |
the unicode codepoint associated withu20ac
is the EURO SIGN character. The best you can do is to change it forEUR
sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java€
character sequence instead of the€
character, the browsers can display it properly. What are the two characters that precede the20ac
seq?
– Luis Colorado
Nov 22 '18 at 5:53
the unicode codepoint associated with
u20ac
is the EURO SIGN character. The best you can do is to change it for EUR
sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java €
character sequence instead of the €
character, the browsers can display it properly. What are the two characters that precede the 20ac
seq?– Luis Colorado
Nov 22 '18 at 5:53
the unicode codepoint associated with
u20ac
is the EURO SIGN character. The best you can do is to change it for EUR
sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java €
character sequence instead of the €
character, the browsers can display it properly. What are the two characters that precede the 20ac
seq?– Luis Colorado
Nov 22 '18 at 5:53
add a comment |
1 Answer
1
active
oldest
votes
If you want any 2 characters followed by 20ac
and then another character you can do something like this:
string.replaceAll("..(20ac).","'$1'");
The .
means any character.
What's in the parenthesis will be captured and used later with $1
.
Regex explanation
If you want to replace only junk characters you need to define them in the regex instead of the .
.
Can be something like this: [㝵]
(put all the junk characters inside the brackets).
For multiple characters you can use *
for zero or more, +
for one or more and {2}
for exactly 2 characters.
So the end result can be something like this [㝵]+(20ac)?
Regex explanation
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53367892%2freplace-junk-character-apostrophe-using-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you want any 2 characters followed by 20ac
and then another character you can do something like this:
string.replaceAll("..(20ac).","'$1'");
The .
means any character.
What's in the parenthesis will be captured and used later with $1
.
Regex explanation
If you want to replace only junk characters you need to define them in the regex instead of the .
.
Can be something like this: [㝵]
(put all the junk characters inside the brackets).
For multiple characters you can use *
for zero or more, +
for one or more and {2}
for exactly 2 characters.
So the end result can be something like this [㝵]+(20ac)?
Regex explanation
add a comment |
If you want any 2 characters followed by 20ac
and then another character you can do something like this:
string.replaceAll("..(20ac).","'$1'");
The .
means any character.
What's in the parenthesis will be captured and used later with $1
.
Regex explanation
If you want to replace only junk characters you need to define them in the regex instead of the .
.
Can be something like this: [㝵]
(put all the junk characters inside the brackets).
For multiple characters you can use *
for zero or more, +
for one or more and {2}
for exactly 2 characters.
So the end result can be something like this [㝵]+(20ac)?
Regex explanation
add a comment |
If you want any 2 characters followed by 20ac
and then another character you can do something like this:
string.replaceAll("..(20ac).","'$1'");
The .
means any character.
What's in the parenthesis will be captured and used later with $1
.
Regex explanation
If you want to replace only junk characters you need to define them in the regex instead of the .
.
Can be something like this: [㝵]
(put all the junk characters inside the brackets).
For multiple characters you can use *
for zero or more, +
for one or more and {2}
for exactly 2 characters.
So the end result can be something like this [㝵]+(20ac)?
Regex explanation
If you want any 2 characters followed by 20ac
and then another character you can do something like this:
string.replaceAll("..(20ac).","'$1'");
The .
means any character.
What's in the parenthesis will be captured and used later with $1
.
Regex explanation
If you want to replace only junk characters you need to define them in the regex instead of the .
.
Can be something like this: [㝵]
(put all the junk characters inside the brackets).
For multiple characters you can use *
for zero or more, +
for one or more and {2}
for exactly 2 characters.
So the end result can be something like this [㝵]+(20ac)?
Regex explanation
edited Nov 19 '18 at 7:23
answered Nov 19 '18 at 7:00
OramOram
738317
738317
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53367892%2freplace-junk-character-apostrophe-using-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
the unicode codepoint associated with
u20ac
is the EURO SIGN character. The best you can do is to change it forEUR
sequence, so you don't have any more problems with euros. A little more of context or a better description of your problem (better than calling junk the thing) could lead to a better solution. Probably if the HTML generator uses Java€
character sequence instead of the€
character, the browsers can display it properly. What are the two characters that precede the20ac
seq?– Luis Colorado
Nov 22 '18 at 5:53