EBNF - other character or space character?












1















I wonder if there is any way to know if a character is an other character or space character beforehand in EBNF? Right now I have lexed every possible variant at each position in the source string, but it gives me a little headache to have to try all possible interpretations, especially if I have to try all possible production rules as well before knowing if it is an other character or space character.



To clarify: spacebar, ' ', is both space character and other character if one looks in ISO/IEC 14977, I wanted to know if it was possible to check which one it is easier than brute forcing every possible interpretation of the source string.



2018-01-06:
Perhaps the ambiguity can be resolved by 6.1? The text implicitly says that gap-separators has higher priority than other-characters outside terminal strings, because otherwise they would be a part of the syntax? Or perhaps it defines an equivalence class of syntaxes, modulo space character, or something like that...










share|improve this question

























  • EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with isspace(3)?

    – AlexP
    Dec 21 '17 at 23:58













  • AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.

    – Emil
    Dec 22 '17 at 7:12













  • Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts x2□□=□a+□/*□comment*/□□b; into ID(x2) OP(=) IDENT(a) OP(+) IDENT(b) SEP(;).

    – AlexP
    Dec 22 '17 at 13:07


















1















I wonder if there is any way to know if a character is an other character or space character beforehand in EBNF? Right now I have lexed every possible variant at each position in the source string, but it gives me a little headache to have to try all possible interpretations, especially if I have to try all possible production rules as well before knowing if it is an other character or space character.



To clarify: spacebar, ' ', is both space character and other character if one looks in ISO/IEC 14977, I wanted to know if it was possible to check which one it is easier than brute forcing every possible interpretation of the source string.



2018-01-06:
Perhaps the ambiguity can be resolved by 6.1? The text implicitly says that gap-separators has higher priority than other-characters outside terminal strings, because otherwise they would be a part of the syntax? Or perhaps it defines an equivalence class of syntaxes, modulo space character, or something like that...










share|improve this question

























  • EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with isspace(3)?

    – AlexP
    Dec 21 '17 at 23:58













  • AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.

    – Emil
    Dec 22 '17 at 7:12













  • Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts x2□□=□a+□/*□comment*/□□b; into ID(x2) OP(=) IDENT(a) OP(+) IDENT(b) SEP(;).

    – AlexP
    Dec 22 '17 at 13:07
















1












1








1


1






I wonder if there is any way to know if a character is an other character or space character beforehand in EBNF? Right now I have lexed every possible variant at each position in the source string, but it gives me a little headache to have to try all possible interpretations, especially if I have to try all possible production rules as well before knowing if it is an other character or space character.



To clarify: spacebar, ' ', is both space character and other character if one looks in ISO/IEC 14977, I wanted to know if it was possible to check which one it is easier than brute forcing every possible interpretation of the source string.



2018-01-06:
Perhaps the ambiguity can be resolved by 6.1? The text implicitly says that gap-separators has higher priority than other-characters outside terminal strings, because otherwise they would be a part of the syntax? Or perhaps it defines an equivalence class of syntaxes, modulo space character, or something like that...










share|improve this question
















I wonder if there is any way to know if a character is an other character or space character beforehand in EBNF? Right now I have lexed every possible variant at each position in the source string, but it gives me a little headache to have to try all possible interpretations, especially if I have to try all possible production rules as well before knowing if it is an other character or space character.



To clarify: spacebar, ' ', is both space character and other character if one looks in ISO/IEC 14977, I wanted to know if it was possible to check which one it is easier than brute forcing every possible interpretation of the source string.



2018-01-06:
Perhaps the ambiguity can be resolved by 6.1? The text implicitly says that gap-separators has higher priority than other-characters outside terminal strings, because otherwise they would be a part of the syntax? Or perhaps it defines an equivalence class of syntaxes, modulo space character, or something like that...







ebnf






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 6 '18 at 16:08







Emil

















asked Dec 21 '17 at 23:30









EmilEmil

1327




1327













  • EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with isspace(3)?

    – AlexP
    Dec 21 '17 at 23:58













  • AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.

    – Emil
    Dec 22 '17 at 7:12













  • Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts x2□□=□a+□/*□comment*/□□b; into ID(x2) OP(=) IDENT(a) OP(+) IDENT(b) SEP(;).

    – AlexP
    Dec 22 '17 at 13:07





















  • EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with isspace(3)?

    – AlexP
    Dec 21 '17 at 23:58













  • AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.

    – Emil
    Dec 22 '17 at 7:12













  • Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts x2□□=□a+□/*□comment*/□□b; into ID(x2) OP(=) IDENT(a) OP(+) IDENT(b) SEP(;).

    – AlexP
    Dec 22 '17 at 13:07



















EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with isspace(3)?

– AlexP
Dec 21 '17 at 23:58







EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with isspace(3)?

– AlexP
Dec 21 '17 at 23:58















AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.

– Emil
Dec 22 '17 at 7:12







AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.

– Emil
Dec 22 '17 at 7:12















Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts x2□□=□a+□/*□comment*/□□b; into ID(x2) OP(=) IDENT(a) OP(+) IDENT(b) SEP(;).

– AlexP
Dec 22 '17 at 13:07







Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts x2□□=□a+□/*□comment*/□□b; into ID(x2) OP(=) IDENT(a) OP(+) IDENT(b) SEP(;).

– AlexP
Dec 22 '17 at 13:07














1 Answer
1






active

oldest

votes


















1















I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?




Yes, an other-character (including space) may appear in a terminal-string (4.17, 4.18), special-sequence (4.20), or bracketed-textual-comment (6.6). Other than that, a space is a gap-separator (6.4, 7.6).



This may be seen by substituting a different other-character, such as # for space. In the cases mentioned: terminal-string, special-sequence, and bracketed-textual-comment; there is no effective change to the automated processing of the EBNF--though the results are undesirable. However, substituting # for space in a gap-separator will show as errors in the automated processing of the EBNF.




Perhaps the ambiguity can be resolved by 6.1?




No, 6.1 expresses an intent but has no definitions or rules.



Consider that 6.2 defines terminal-character to include other-character. This means that each of # and space is a terminal character. In 6.3, terminal-character is a gap-free-symbol, but #, unlike the other symbols in 6.2, has no meaning in the standard. Furthermore, in 6.3 and 6.4, space is both a gap-free-symbol and a gap-separator. The inclusion of terminal-character in 6.3 appears to be a defect in the standard, but not the only one.



In 8.1, "The syntax of Extended BNF", there are some defects.



The following is not defined in 6.5:



(* see 6.5 *) syntax 
= (gap separator},
gap free symbol, {gap separator},
{gap free symbol, {gap separator}};


There is no 6.9 for the following:



(* see 6.9 *) syntax 
= {bracketed textual comment},
commentless symbol,
{bracketed textual comment},
{commentless symbol,
{bracketed textual comment)};


References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.






share|improve this answer


























  • Thank you, did not know if it was a defect or my misinterpretation.

    – Emil
    Nov 21 '18 at 7:23











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47934110%2febnf-other-character-or-space-character%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1















I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?




Yes, an other-character (including space) may appear in a terminal-string (4.17, 4.18), special-sequence (4.20), or bracketed-textual-comment (6.6). Other than that, a space is a gap-separator (6.4, 7.6).



This may be seen by substituting a different other-character, such as # for space. In the cases mentioned: terminal-string, special-sequence, and bracketed-textual-comment; there is no effective change to the automated processing of the EBNF--though the results are undesirable. However, substituting # for space in a gap-separator will show as errors in the automated processing of the EBNF.




Perhaps the ambiguity can be resolved by 6.1?




No, 6.1 expresses an intent but has no definitions or rules.



Consider that 6.2 defines terminal-character to include other-character. This means that each of # and space is a terminal character. In 6.3, terminal-character is a gap-free-symbol, but #, unlike the other symbols in 6.2, has no meaning in the standard. Furthermore, in 6.3 and 6.4, space is both a gap-free-symbol and a gap-separator. The inclusion of terminal-character in 6.3 appears to be a defect in the standard, but not the only one.



In 8.1, "The syntax of Extended BNF", there are some defects.



The following is not defined in 6.5:



(* see 6.5 *) syntax 
= (gap separator},
gap free symbol, {gap separator},
{gap free symbol, {gap separator}};


There is no 6.9 for the following:



(* see 6.9 *) syntax 
= {bracketed textual comment},
commentless symbol,
{bracketed textual comment},
{commentless symbol,
{bracketed textual comment)};


References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.






share|improve this answer


























  • Thank you, did not know if it was a defect or my misinterpretation.

    – Emil
    Nov 21 '18 at 7:23
















1















I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?




Yes, an other-character (including space) may appear in a terminal-string (4.17, 4.18), special-sequence (4.20), or bracketed-textual-comment (6.6). Other than that, a space is a gap-separator (6.4, 7.6).



This may be seen by substituting a different other-character, such as # for space. In the cases mentioned: terminal-string, special-sequence, and bracketed-textual-comment; there is no effective change to the automated processing of the EBNF--though the results are undesirable. However, substituting # for space in a gap-separator will show as errors in the automated processing of the EBNF.




Perhaps the ambiguity can be resolved by 6.1?




No, 6.1 expresses an intent but has no definitions or rules.



Consider that 6.2 defines terminal-character to include other-character. This means that each of # and space is a terminal character. In 6.3, terminal-character is a gap-free-symbol, but #, unlike the other symbols in 6.2, has no meaning in the standard. Furthermore, in 6.3 and 6.4, space is both a gap-free-symbol and a gap-separator. The inclusion of terminal-character in 6.3 appears to be a defect in the standard, but not the only one.



In 8.1, "The syntax of Extended BNF", there are some defects.



The following is not defined in 6.5:



(* see 6.5 *) syntax 
= (gap separator},
gap free symbol, {gap separator},
{gap free symbol, {gap separator}};


There is no 6.9 for the following:



(* see 6.9 *) syntax 
= {bracketed textual comment},
commentless symbol,
{bracketed textual comment},
{commentless symbol,
{bracketed textual comment)};


References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.






share|improve this answer


























  • Thank you, did not know if it was a defect or my misinterpretation.

    – Emil
    Nov 21 '18 at 7:23














1












1








1








I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?




Yes, an other-character (including space) may appear in a terminal-string (4.17, 4.18), special-sequence (4.20), or bracketed-textual-comment (6.6). Other than that, a space is a gap-separator (6.4, 7.6).



This may be seen by substituting a different other-character, such as # for space. In the cases mentioned: terminal-string, special-sequence, and bracketed-textual-comment; there is no effective change to the automated processing of the EBNF--though the results are undesirable. However, substituting # for space in a gap-separator will show as errors in the automated processing of the EBNF.




Perhaps the ambiguity can be resolved by 6.1?




No, 6.1 expresses an intent but has no definitions or rules.



Consider that 6.2 defines terminal-character to include other-character. This means that each of # and space is a terminal character. In 6.3, terminal-character is a gap-free-symbol, but #, unlike the other symbols in 6.2, has no meaning in the standard. Furthermore, in 6.3 and 6.4, space is both a gap-free-symbol and a gap-separator. The inclusion of terminal-character in 6.3 appears to be a defect in the standard, but not the only one.



In 8.1, "The syntax of Extended BNF", there are some defects.



The following is not defined in 6.5:



(* see 6.5 *) syntax 
= (gap separator},
gap free symbol, {gap separator},
{gap free symbol, {gap separator}};


There is no 6.9 for the following:



(* see 6.9 *) syntax 
= {bracketed textual comment},
commentless symbol,
{bracketed textual comment},
{commentless symbol,
{bracketed textual comment)};


References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.






share|improve this answer
















I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?




Yes, an other-character (including space) may appear in a terminal-string (4.17, 4.18), special-sequence (4.20), or bracketed-textual-comment (6.6). Other than that, a space is a gap-separator (6.4, 7.6).



This may be seen by substituting a different other-character, such as # for space. In the cases mentioned: terminal-string, special-sequence, and bracketed-textual-comment; there is no effective change to the automated processing of the EBNF--though the results are undesirable. However, substituting # for space in a gap-separator will show as errors in the automated processing of the EBNF.




Perhaps the ambiguity can be resolved by 6.1?




No, 6.1 expresses an intent but has no definitions or rules.



Consider that 6.2 defines terminal-character to include other-character. This means that each of # and space is a terminal character. In 6.3, terminal-character is a gap-free-symbol, but #, unlike the other symbols in 6.2, has no meaning in the standard. Furthermore, in 6.3 and 6.4, space is both a gap-free-symbol and a gap-separator. The inclusion of terminal-character in 6.3 appears to be a defect in the standard, but not the only one.



In 8.1, "The syntax of Extended BNF", there are some defects.



The following is not defined in 6.5:



(* see 6.5 *) syntax 
= (gap separator},
gap free symbol, {gap separator},
{gap free symbol, {gap separator}};


There is no 6.9 for the following:



(* see 6.9 *) syntax 
= {bracketed textual comment},
commentless symbol,
{bracketed textual comment},
{commentless symbol,
{bracketed textual comment)};


References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 22 '18 at 12:58

























answered Nov 21 '18 at 1:35









Rick SmithRick Smith

1,7673616




1,7673616













  • Thank you, did not know if it was a defect or my misinterpretation.

    – Emil
    Nov 21 '18 at 7:23



















  • Thank you, did not know if it was a defect or my misinterpretation.

    – Emil
    Nov 21 '18 at 7:23

















Thank you, did not know if it was a defect or my misinterpretation.

– Emil
Nov 21 '18 at 7:23





Thank you, did not know if it was a defect or my misinterpretation.

– Emil
Nov 21 '18 at 7:23




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47934110%2febnf-other-character-or-space-character%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Guess what letter conforming each word

Run scheduled task as local user group (not BUILTIN)

Port of Spain