EBNF - other character or space character?
I wonder if there is any way to know if a character is an other character or space character beforehand in EBNF? Right now I have lexed every possible variant at each position in the source string, but it gives me a little headache to have to try all possible interpretations, especially if I have to try all possible production rules as well before knowing if it is an other character or space character.
To clarify: spacebar, ' ', is both space character and other character if one looks in ISO/IEC 14977, I wanted to know if it was possible to check which one it is easier than brute forcing every possible interpretation of the source string.
2018-01-06:
Perhaps the ambiguity can be resolved by 6.1? The text implicitly says that gap-separators has higher priority than other-characters outside terminal strings, because otherwise they would be a part of the syntax? Or perhaps it defines an equivalence class of syntaxes, modulo space character, or something like that...
ebnf
add a comment |
I wonder if there is any way to know if a character is an other character or space character beforehand in EBNF? Right now I have lexed every possible variant at each position in the source string, but it gives me a little headache to have to try all possible interpretations, especially if I have to try all possible production rules as well before knowing if it is an other character or space character.
To clarify: spacebar, ' ', is both space character and other character if one looks in ISO/IEC 14977, I wanted to know if it was possible to check which one it is easier than brute forcing every possible interpretation of the source string.
2018-01-06:
Perhaps the ambiguity can be resolved by 6.1? The text implicitly says that gap-separators has higher priority than other-characters outside terminal strings, because otherwise they would be a part of the syntax? Or perhaps it defines an equivalence class of syntaxes, modulo space character, or something like that...
ebnf
EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong withisspace
(3)?
– AlexP
Dec 21 '17 at 23:58
AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.
– Emil
Dec 22 '17 at 7:12
Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer convertsx2□□=□a+□/*□comment*/□□b;
into ID(x2
) OP(=
) IDENT(a
) OP(+
) IDENT(b
) SEP(;
).
– AlexP
Dec 22 '17 at 13:07
add a comment |
I wonder if there is any way to know if a character is an other character or space character beforehand in EBNF? Right now I have lexed every possible variant at each position in the source string, but it gives me a little headache to have to try all possible interpretations, especially if I have to try all possible production rules as well before knowing if it is an other character or space character.
To clarify: spacebar, ' ', is both space character and other character if one looks in ISO/IEC 14977, I wanted to know if it was possible to check which one it is easier than brute forcing every possible interpretation of the source string.
2018-01-06:
Perhaps the ambiguity can be resolved by 6.1? The text implicitly says that gap-separators has higher priority than other-characters outside terminal strings, because otherwise they would be a part of the syntax? Or perhaps it defines an equivalence class of syntaxes, modulo space character, or something like that...
ebnf
I wonder if there is any way to know if a character is an other character or space character beforehand in EBNF? Right now I have lexed every possible variant at each position in the source string, but it gives me a little headache to have to try all possible interpretations, especially if I have to try all possible production rules as well before knowing if it is an other character or space character.
To clarify: spacebar, ' ', is both space character and other character if one looks in ISO/IEC 14977, I wanted to know if it was possible to check which one it is easier than brute forcing every possible interpretation of the source string.
2018-01-06:
Perhaps the ambiguity can be resolved by 6.1? The text implicitly says that gap-separators has higher priority than other-characters outside terminal strings, because otherwise they would be a part of the syntax? Or perhaps it defines an equivalence class of syntaxes, modulo space character, or something like that...
ebnf
ebnf
edited Jan 6 '18 at 16:08
Emil
asked Dec 21 '17 at 23:30
EmilEmil
1327
1327
EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong withisspace
(3)?
– AlexP
Dec 21 '17 at 23:58
AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.
– Emil
Dec 22 '17 at 7:12
Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer convertsx2□□=□a+□/*□comment*/□□b;
into ID(x2
) OP(=
) IDENT(a
) OP(+
) IDENT(b
) SEP(;
).
– AlexP
Dec 22 '17 at 13:07
add a comment |
EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong withisspace
(3)?
– AlexP
Dec 21 '17 at 23:58
AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.
– Emil
Dec 22 '17 at 7:12
Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer convertsx2□□=□a+□/*□comment*/□□b;
into ID(x2
) OP(=
) IDENT(a
) OP(+
) IDENT(b
) SEP(;
).
– AlexP
Dec 22 '17 at 13:07
EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with
isspace
(3)?– AlexP
Dec 21 '17 at 23:58
EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with
isspace
(3)?– AlexP
Dec 21 '17 at 23:58
AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.
– Emil
Dec 22 '17 at 7:12
AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.
– Emil
Dec 22 '17 at 7:12
Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts
x2□□=□a+□/*□comment*/□□b;
into ID(x2
) OP(=
) IDENT(a
) OP(+
) IDENT(b
) SEP(;
).– AlexP
Dec 22 '17 at 13:07
Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts
x2□□=□a+□/*□comment*/□□b;
into ID(x2
) OP(=
) IDENT(a
) OP(+
) IDENT(b
) SEP(;
).– AlexP
Dec 22 '17 at 13:07
add a comment |
1 Answer
1
active
oldest
votes
I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?
Yes, an other-character
(including space) may appear in a terminal-string
(4.17, 4.18), special-sequence
(4.20), or bracketed-textual-comment
(6.6). Other than that, a space
is a gap-separator
(6.4, 7.6).
This may be seen by substituting a different other-character
, such as #
for space
. In the cases mentioned: terminal-string
, special-sequence
, and bracketed-textual-comment
; there is no effective change to the automated processing of the EBNF--though the results are undesirable. However, substituting #
for space
in a gap-separator
will show as errors in the automated processing of the EBNF.
Perhaps the ambiguity can be resolved by 6.1?
No, 6.1 expresses an intent but has no definitions or rules.
Consider that 6.2 defines terminal-character
to include other-character
. This means that each of #
and space
is a terminal character
. In 6.3, terminal-character
is a gap-free-symbol
, but #
, unlike the other symbols in 6.2, has no meaning in the standard. Furthermore, in 6.3 and 6.4, space
is both a gap-free-symbol
and a gap-separator
. The inclusion of terminal-character
in 6.3 appears to be a defect in the standard, but not the only one.
In 8.1, "The syntax of Extended BNF", there are some defects.
The following is not defined in 6.5:
(* see 6.5 *) syntax
= (gap separator},
gap free symbol, {gap separator},
{gap free symbol, {gap separator}};
There is no 6.9 for the following:
(* see 6.9 *) syntax
= {bracketed textual comment},
commentless symbol,
{bracketed textual comment},
{commentless symbol,
{bracketed textual comment)};
References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.
Thank you, did not know if it was a defect or my misinterpretation.
– Emil
Nov 21 '18 at 7:23
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47934110%2febnf-other-character-or-space-character%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?
Yes, an other-character
(including space) may appear in a terminal-string
(4.17, 4.18), special-sequence
(4.20), or bracketed-textual-comment
(6.6). Other than that, a space
is a gap-separator
(6.4, 7.6).
This may be seen by substituting a different other-character
, such as #
for space
. In the cases mentioned: terminal-string
, special-sequence
, and bracketed-textual-comment
; there is no effective change to the automated processing of the EBNF--though the results are undesirable. However, substituting #
for space
in a gap-separator
will show as errors in the automated processing of the EBNF.
Perhaps the ambiguity can be resolved by 6.1?
No, 6.1 expresses an intent but has no definitions or rules.
Consider that 6.2 defines terminal-character
to include other-character
. This means that each of #
and space
is a terminal character
. In 6.3, terminal-character
is a gap-free-symbol
, but #
, unlike the other symbols in 6.2, has no meaning in the standard. Furthermore, in 6.3 and 6.4, space
is both a gap-free-symbol
and a gap-separator
. The inclusion of terminal-character
in 6.3 appears to be a defect in the standard, but not the only one.
In 8.1, "The syntax of Extended BNF", there are some defects.
The following is not defined in 6.5:
(* see 6.5 *) syntax
= (gap separator},
gap free symbol, {gap separator},
{gap free symbol, {gap separator}};
There is no 6.9 for the following:
(* see 6.9 *) syntax
= {bracketed textual comment},
commentless symbol,
{bracketed textual comment},
{commentless symbol,
{bracketed textual comment)};
References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.
Thank you, did not know if it was a defect or my misinterpretation.
– Emil
Nov 21 '18 at 7:23
add a comment |
I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?
Yes, an other-character
(including space) may appear in a terminal-string
(4.17, 4.18), special-sequence
(4.20), or bracketed-textual-comment
(6.6). Other than that, a space
is a gap-separator
(6.4, 7.6).
This may be seen by substituting a different other-character
, such as #
for space
. In the cases mentioned: terminal-string
, special-sequence
, and bracketed-textual-comment
; there is no effective change to the automated processing of the EBNF--though the results are undesirable. However, substituting #
for space
in a gap-separator
will show as errors in the automated processing of the EBNF.
Perhaps the ambiguity can be resolved by 6.1?
No, 6.1 expresses an intent but has no definitions or rules.
Consider that 6.2 defines terminal-character
to include other-character
. This means that each of #
and space
is a terminal character
. In 6.3, terminal-character
is a gap-free-symbol
, but #
, unlike the other symbols in 6.2, has no meaning in the standard. Furthermore, in 6.3 and 6.4, space
is both a gap-free-symbol
and a gap-separator
. The inclusion of terminal-character
in 6.3 appears to be a defect in the standard, but not the only one.
In 8.1, "The syntax of Extended BNF", there are some defects.
The following is not defined in 6.5:
(* see 6.5 *) syntax
= (gap separator},
gap free symbol, {gap separator},
{gap free symbol, {gap separator}};
There is no 6.9 for the following:
(* see 6.9 *) syntax
= {bracketed textual comment},
commentless symbol,
{bracketed textual comment},
{commentless symbol,
{bracketed textual comment)};
References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.
Thank you, did not know if it was a defect or my misinterpretation.
– Emil
Nov 21 '18 at 7:23
add a comment |
I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?
Yes, an other-character
(including space) may appear in a terminal-string
(4.17, 4.18), special-sequence
(4.20), or bracketed-textual-comment
(6.6). Other than that, a space
is a gap-separator
(6.4, 7.6).
This may be seen by substituting a different other-character
, such as #
for space
. In the cases mentioned: terminal-string
, special-sequence
, and bracketed-textual-comment
; there is no effective change to the automated processing of the EBNF--though the results are undesirable. However, substituting #
for space
in a gap-separator
will show as errors in the automated processing of the EBNF.
Perhaps the ambiguity can be resolved by 6.1?
No, 6.1 expresses an intent but has no definitions or rules.
Consider that 6.2 defines terminal-character
to include other-character
. This means that each of #
and space
is a terminal character
. In 6.3, terminal-character
is a gap-free-symbol
, but #
, unlike the other symbols in 6.2, has no meaning in the standard. Furthermore, in 6.3 and 6.4, space
is both a gap-free-symbol
and a gap-separator
. The inclusion of terminal-character
in 6.3 appears to be a defect in the standard, but not the only one.
In 8.1, "The syntax of Extended BNF", there are some defects.
The following is not defined in 6.5:
(* see 6.5 *) syntax
= (gap separator},
gap free symbol, {gap separator},
{gap free symbol, {gap separator}};
There is no 6.9 for the following:
(* see 6.9 *) syntax
= {bracketed textual comment},
commentless symbol,
{bracketed textual comment},
{commentless symbol,
{bracketed textual comment)};
References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.
I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?
Yes, an other-character
(including space) may appear in a terminal-string
(4.17, 4.18), special-sequence
(4.20), or bracketed-textual-comment
(6.6). Other than that, a space
is a gap-separator
(6.4, 7.6).
This may be seen by substituting a different other-character
, such as #
for space
. In the cases mentioned: terminal-string
, special-sequence
, and bracketed-textual-comment
; there is no effective change to the automated processing of the EBNF--though the results are undesirable. However, substituting #
for space
in a gap-separator
will show as errors in the automated processing of the EBNF.
Perhaps the ambiguity can be resolved by 6.1?
No, 6.1 expresses an intent but has no definitions or rules.
Consider that 6.2 defines terminal-character
to include other-character
. This means that each of #
and space
is a terminal character
. In 6.3, terminal-character
is a gap-free-symbol
, but #
, unlike the other symbols in 6.2, has no meaning in the standard. Furthermore, in 6.3 and 6.4, space
is both a gap-free-symbol
and a gap-separator
. The inclusion of terminal-character
in 6.3 appears to be a defect in the standard, but not the only one.
In 8.1, "The syntax of Extended BNF", there are some defects.
The following is not defined in 6.5:
(* see 6.5 *) syntax
= (gap separator},
gap free symbol, {gap separator},
{gap free symbol, {gap separator}};
There is no 6.9 for the following:
(* see 6.9 *) syntax
= {bracketed textual comment},
commentless symbol,
{bracketed textual comment},
{commentless symbol,
{bracketed textual comment)};
References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.
edited Nov 22 '18 at 12:58
answered Nov 21 '18 at 1:35
Rick SmithRick Smith
1,7673616
1,7673616
Thank you, did not know if it was a defect or my misinterpretation.
– Emil
Nov 21 '18 at 7:23
add a comment |
Thank you, did not know if it was a defect or my misinterpretation.
– Emil
Nov 21 '18 at 7:23
Thank you, did not know if it was a defect or my misinterpretation.
– Emil
Nov 21 '18 at 7:23
Thank you, did not know if it was a defect or my misinterpretation.
– Emil
Nov 21 '18 at 7:23
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47934110%2febnf-other-character-or-space-character%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with
isspace
(3)?– AlexP
Dec 21 '17 at 23:58
AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.
– Emil
Dec 22 '17 at 7:12
Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts
x2□□=□a+□/*□comment*/□□b;
into ID(x2
) OP(=
) IDENT(a
) OP(+
) IDENT(b
) SEP(;
).– AlexP
Dec 22 '17 at 13:07