Unclear note in the October 2018 release of the Custom Translator User Guide
up vote
0
down vote
favorite
Can anyone clarify what the following note exactly means?
NOTE: There must not be any new line characters; “n” or “r” at the end of sentences. If there are then the alignment of sentences will be corrupted and the training will not be effective.
The note appears on page 5, section 2.1.2.1 Parallel documents.
Does this apply to any document formats? It does not make much sense (at least to me), for instance for .align documents...
microsoft-translator
add a comment |
up vote
0
down vote
favorite
Can anyone clarify what the following note exactly means?
NOTE: There must not be any new line characters; “n” or “r” at the end of sentences. If there are then the alignment of sentences will be corrupted and the training will not be effective.
The note appears on page 5, section 2.1.2.1 Parallel documents.
Does this apply to any document formats? It does not make much sense (at least to me), for instance for .align documents...
microsoft-translator
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Can anyone clarify what the following note exactly means?
NOTE: There must not be any new line characters; “n” or “r” at the end of sentences. If there are then the alignment of sentences will be corrupted and the training will not be effective.
The note appears on page 5, section 2.1.2.1 Parallel documents.
Does this apply to any document formats? It does not make much sense (at least to me), for instance for .align documents...
microsoft-translator
Can anyone clarify what the following note exactly means?
NOTE: There must not be any new line characters; “n” or “r” at the end of sentences. If there are then the alignment of sentences will be corrupted and the training will not be effective.
The note appears on page 5, section 2.1.2.1 Parallel documents.
Does this apply to any document formats? It does not make much sense (at least to me), for instance for .align documents...
microsoft-translator
microsoft-translator
asked Nov 8 at 13:19
Albert Llorens
1
1
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Thank you for bringing this to our attention. We will update the documentation as this statement is inaccurate. It should read
"NOTE: There must not be any new line characters; “n” or “r” within a sentence. If there are then the alignment of sentences will be corrupted and the training will not be effective."
The issue we want to address here is that parallel documents should not break a single sentence across multiple lines as it makes sentence alignment much less effective.
In regards to your question regarding .align files. We do not sentence align on these files, so you could break the sentences across multiple lines as long as you did it consistently. That is to say that if you have a sentence broken into three lines on the source side, it should be broken into three lines on the target side. Since the sentence aligner is not used, even one in unmatched split would cause misalignments to all the following sentences. There is no advantage to splitting sentences, so I strongly urge you not to do that.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Thank you for bringing this to our attention. We will update the documentation as this statement is inaccurate. It should read
"NOTE: There must not be any new line characters; “n” or “r” within a sentence. If there are then the alignment of sentences will be corrupted and the training will not be effective."
The issue we want to address here is that parallel documents should not break a single sentence across multiple lines as it makes sentence alignment much less effective.
In regards to your question regarding .align files. We do not sentence align on these files, so you could break the sentences across multiple lines as long as you did it consistently. That is to say that if you have a sentence broken into three lines on the source side, it should be broken into three lines on the target side. Since the sentence aligner is not used, even one in unmatched split would cause misalignments to all the following sentences. There is no advantage to splitting sentences, so I strongly urge you not to do that.
add a comment |
up vote
0
down vote
Thank you for bringing this to our attention. We will update the documentation as this statement is inaccurate. It should read
"NOTE: There must not be any new line characters; “n” or “r” within a sentence. If there are then the alignment of sentences will be corrupted and the training will not be effective."
The issue we want to address here is that parallel documents should not break a single sentence across multiple lines as it makes sentence alignment much less effective.
In regards to your question regarding .align files. We do not sentence align on these files, so you could break the sentences across multiple lines as long as you did it consistently. That is to say that if you have a sentence broken into three lines on the source side, it should be broken into three lines on the target side. Since the sentence aligner is not used, even one in unmatched split would cause misalignments to all the following sentences. There is no advantage to splitting sentences, so I strongly urge you not to do that.
add a comment |
up vote
0
down vote
up vote
0
down vote
Thank you for bringing this to our attention. We will update the documentation as this statement is inaccurate. It should read
"NOTE: There must not be any new line characters; “n” or “r” within a sentence. If there are then the alignment of sentences will be corrupted and the training will not be effective."
The issue we want to address here is that parallel documents should not break a single sentence across multiple lines as it makes sentence alignment much less effective.
In regards to your question regarding .align files. We do not sentence align on these files, so you could break the sentences across multiple lines as long as you did it consistently. That is to say that if you have a sentence broken into three lines on the source side, it should be broken into three lines on the target side. Since the sentence aligner is not used, even one in unmatched split would cause misalignments to all the following sentences. There is no advantage to splitting sentences, so I strongly urge you not to do that.
Thank you for bringing this to our attention. We will update the documentation as this statement is inaccurate. It should read
"NOTE: There must not be any new line characters; “n” or “r” within a sentence. If there are then the alignment of sentences will be corrupted and the training will not be effective."
The issue we want to address here is that parallel documents should not break a single sentence across multiple lines as it makes sentence alignment much less effective.
In regards to your question regarding .align files. We do not sentence align on these files, so you could break the sentences across multiple lines as long as you did it consistently. That is to say that if you have a sentence broken into three lines on the source side, it should be broken into three lines on the target side. Since the sentence aligner is not used, even one in unmatched split would cause misalignments to all the following sentences. There is no advantage to splitting sentences, so I strongly urge you not to do that.
answered Nov 8 at 19:59
ScottG
1013
1013
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53208585%2funclear-note-in-the-october-2018-release-of-the-custom-translator-user-guide%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown