EBNF - other character or space character?

I wonder if there is any way to know if a character is an other character or space character beforehand in EBNF? Right now I have lexed every possible variant at each position in the source string, but it gives me a little headache to have to try all possible interpretations, especially if I have to try all possible production rules as well before knowing if it is an other character or space character.

To clarify: spacebar, ' ', is both space character and other character if one looks in ISO/IEC 14977, I wanted to know if it was possible to check which one it is easier than brute forcing every possible interpretation of the source string.

2018-01-06:
Perhaps the ambiguity can be resolved by 6.1? The text implicitly says that gap-separators has higher priority than other-characters outside terminal strings, because otherwise they would be a part of the syntax? Or perhaps it defines an equivalence class of syntaxes, modulo space character, or something like that...

edited Jan 6 '18 at 16:08

asked Dec 21 '17 at 23:30

Emil

1327

EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with isspace(3)?

– AlexP
Dec 21 '17 at 23:58

AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.

– Emil
Dec 22 '17 at 7:12

Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts x2□□=□a+□/*□comment*/□□b; into ID(x2) OP(=) IDENT(a) OP(+) IDENT(b) SEP(;).

– AlexP
Dec 22 '17 at 13:07

add a comment |

edited Jan 6 '18 at 16:08

asked Dec 21 '17 at 23:30

Emil

1327

EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with isspace(3)?

– AlexP
Dec 21 '17 at 23:58

AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.

– Emil
Dec 22 '17 at 7:12

Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts x2□□=□a+□/*□comment*/□□b; into ID(x2) OP(=) IDENT(a) OP(+) IDENT(b) SEP(;).

– AlexP
Dec 22 '17 at 13:07

add a comment |

edited Jan 6 '18 at 16:08

asked Dec 21 '17 at 23:30

Emil

1327

ebnf

edited Jan 6 '18 at 16:08

asked Dec 21 '17 at 23:30

Emil

1327

edited Jan 6 '18 at 16:08

asked Dec 21 '17 at 23:30

Emil

1327

edited Jan 6 '18 at 16:08

asked Dec 21 '17 at 23:30

Emil

1327

asked Dec 21 '17 at 23:30

Emil

1327

asked Dec 21 '17 at 23:30

Emil

1327

EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with isspace(3)?

– AlexP
Dec 21 '17 at 23:58

AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.

– Emil
Dec 22 '17 at 7:12

Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts x2□□=□a+□/*□comment*/□□b; into ID(x2) OP(=) IDENT(a) OP(+) IDENT(b) SEP(;).

– AlexP
Dec 22 '17 at 13:07

add a comment |

EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with isspace(3)?

– AlexP
Dec 21 '17 at 23:58

AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.

– Emil
Dec 22 '17 at 7:12

Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts x2□□=□a+□/*□comment*/□□b; into ID(x2) OP(=) IDENT(a) OP(+) IDENT(b) SEP(;).

– AlexP
Dec 22 '17 at 13:07

EBNF as in Extended Backus-Naur Form? Lexers normally classify characters by a simple table lookup or call to a friendly library function which performs the table lookup; for example, what's wrong with isspace(3)?

– AlexP
Dec 21 '17 at 23:58

AlexP: the problem is not checking whether it is a spacebar, I do that with a regex. The problem is that EBNF is not one to one from the source characters to its symbols, at least not in a obvious way to me.

– Emil
Dec 22 '17 at 7:12

Most computer languages are described by two grammars. One grammar describes lexical elements, such as number, identifiers, keywords, separators, operators, and comments; this grammar is implemented by a lexical analyzer (aka lexer). The lexer converts the source code in an intermediate form where the end symbols are annotated lexical elements, that is then parsed by the syntactical analyzer (aka parser) which implements the second, higher-level, grammar. For example, the lexer converts x2□□=□a+□/*□comment*/□□b; into ID(x2) OP(=) IDENT(a) OP(+) IDENT(b) SEP(;).

– AlexP
Dec 22 '17 at 13:07

add a comment |

1 Answer
1

active

oldest

votes

I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?

Yes, an other-character (including space) may appear in a terminal-string (4.17, 4.18), special-sequence (4.20), or bracketed-textual-comment (6.6). Other than that, a space is a gap-separator (6.4, 7.6).

This may be seen by substituting a different other-character, such as # for space. In the cases mentioned: terminal-string, special-sequence, and bracketed-textual-comment; there is no effective change to the automated processing of the EBNF--though the results are undesirable. However, substituting # for space in a gap-separator will show as errors in the automated processing of the EBNF.

Perhaps the ambiguity can be resolved by 6.1?

No, 6.1 expresses an intent but has no definitions or rules.

Consider that 6.2 defines terminal-character to include other-character. This means that each of # and space is a terminal character. In 6.3, terminal-character is a gap-free-symbol, but #, unlike the other symbols in 6.2, has no meaning in the standard. Furthermore, in 6.3 and 6.4, space is both a gap-free-symbol and a gap-separator. The inclusion of terminal-character in 6.3 appears to be a defect in the standard, but not the only one.

In 8.1, "The syntax of Extended BNF", there are some defects.

The following is not defined in 6.5:

(* see 6.5 *) syntax 

  = (gap separator}, 

    gap free symbol, {gap separator}, 

    {gap free symbol, {gap separator}};

There is no 6.9 for the following:

(* see 6.9 *) syntax 

  = {bracketed textual comment}, 

    commentless symbol, 

    {bracketed textual comment}, 

    {commentless symbol, 

    {bracketed textual comment)};

References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.

edited Nov 22 '18 at 12:58

answered Nov 21 '18 at 1:35

Rick Smith

1,7673616

Thank you, did not know if it was a defect or my misinterpretation.

– Emil
Nov 21 '18 at 7:23

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f47934110%2febnf-other-character-or-space-character%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?

Perhaps the ambiguity can be resolved by 6.1?

No, 6.1 expresses an intent but has no definitions or rules.

In 8.1, "The syntax of Extended BNF", there are some defects.

The following is not defined in 6.5:

(* see 6.5 *) syntax 

  = (gap separator}, 

    gap free symbol, {gap separator}, 

    {gap free symbol, {gap separator}};

There is no 6.9 for the following:

(* see 6.9 *) syntax 

  = {bracketed textual comment}, 

    commentless symbol, 

    {bracketed textual comment}, 

    {commentless symbol, 

    {bracketed textual comment)};

References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.

edited Nov 22 '18 at 12:58

answered Nov 21 '18 at 1:35

Rick Smith

1,7673616

Thank you, did not know if it was a defect or my misinterpretation.

– Emil
Nov 21 '18 at 7:23

add a comment |

I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?

Perhaps the ambiguity can be resolved by 6.1?

No, 6.1 expresses an intent but has no definitions or rules.

In 8.1, "The syntax of Extended BNF", there are some defects.

The following is not defined in 6.5:

(* see 6.5 *) syntax 

  = (gap separator}, 

    gap free symbol, {gap separator}, 

    {gap free symbol, {gap separator}};

There is no 6.9 for the following:

(* see 6.9 *) syntax 

  = {bracketed textual comment}, 

    commentless symbol, 

    {bracketed textual comment}, 

    {commentless symbol, 

    {bracketed textual comment)};

References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.

edited Nov 22 '18 at 12:58

answered Nov 21 '18 at 1:35

Rick Smith

1,7673616

Thank you, did not know if it was a defect or my misinterpretation.

– Emil
Nov 21 '18 at 7:23

add a comment |

I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?

Perhaps the ambiguity can be resolved by 6.1?

No, 6.1 expresses an intent but has no definitions or rules.

In 8.1, "The syntax of Extended BNF", there are some defects.

The following is not defined in 6.5:

(* see 6.5 *) syntax 

  = (gap separator}, 

    gap free symbol, {gap separator}, 

    {gap free symbol, {gap separator}};

There is no 6.9 for the following:

(* see 6.9 *) syntax 

  = {bracketed textual comment}, 

    commentless symbol, 

    {bracketed textual comment}, 

    {commentless symbol, 

    {bracketed textual comment)};

References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.

edited Nov 22 '18 at 12:58

answered Nov 21 '18 at 1:35

Rick Smith

1,7673616

I wonder if there is any way to know if a character is an other character
or space character beforehand in EBNF?

Perhaps the ambiguity can be resolved by 6.1?

No, 6.1 expresses an intent but has no definitions or rules.

In 8.1, "The syntax of Extended BNF", there are some defects.

The following is not defined in 6.5:

(* see 6.5 *) syntax 

  = (gap separator}, 

    gap free symbol, {gap separator}, 

    {gap free symbol, {gap separator}};

There is no 6.9 for the following:

(* see 6.9 *) syntax 

  = {bracketed textual comment}, 

    commentless symbol, 

    {bracketed textual comment}, 

    {commentless symbol, 

    {bracketed textual comment)};

References to 6.6 through 6.8 are incorrectly numbered and should be 6.5 through 6.7, respectively.

edited Nov 22 '18 at 12:58

answered Nov 21 '18 at 1:35

Rick Smith

1,7673616

edited Nov 22 '18 at 12:58

answered Nov 21 '18 at 1:35

Rick Smith

1,7673616

answered Nov 21 '18 at 1:35

Rick Smith

1,7673616

answered Nov 21 '18 at 1:35

Rick Smith

1,7673616

Thank you, did not know if it was a defect or my misinterpretation.

– Emil
Nov 21 '18 at 7:23

add a comment |

Thank you, did not know if it was a defect or my misinterpretation.

– Emil
Nov 21 '18 at 7:23

Thank you, did not know if it was a defect or my misinterpretation.

– Emil
Nov 21 '18 at 7:23

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk