How to construct an abstract syntax tree











up vote
56
down vote

favorite
23












I have a general idea of what an AST is, but I want to know how to construct one.



If you're given a grammar and a parse tree, how do you construct the AST?



How do you do it if you're given a grammar and an expression?










share|improve this question


















  • 12




    The "answer" provided here by HS is confusing and actualy doesn't address the question directly. This question actually has an answer here: stackoverflow.com/a/25106688/120163
    – Ira Baxter
    Aug 3 '14 at 17:29















up vote
56
down vote

favorite
23












I have a general idea of what an AST is, but I want to know how to construct one.



If you're given a grammar and a parse tree, how do you construct the AST?



How do you do it if you're given a grammar and an expression?










share|improve this question


















  • 12




    The "answer" provided here by HS is confusing and actualy doesn't address the question directly. This question actually has an answer here: stackoverflow.com/a/25106688/120163
    – Ira Baxter
    Aug 3 '14 at 17:29













up vote
56
down vote

favorite
23









up vote
56
down vote

favorite
23






23





I have a general idea of what an AST is, but I want to know how to construct one.



If you're given a grammar and a parse tree, how do you construct the AST?



How do you do it if you're given a grammar and an expression?










share|improve this question













I have a general idea of what an AST is, but I want to know how to construct one.



If you're given a grammar and a parse tree, how do you construct the AST?



How do you do it if you're given a grammar and an expression?







abstract-syntax-tree






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 12 '09 at 11:24









neuromancer

17.4k70146205




17.4k70146205








  • 12




    The "answer" provided here by HS is confusing and actualy doesn't address the question directly. This question actually has an answer here: stackoverflow.com/a/25106688/120163
    – Ira Baxter
    Aug 3 '14 at 17:29














  • 12




    The "answer" provided here by HS is confusing and actualy doesn't address the question directly. This question actually has an answer here: stackoverflow.com/a/25106688/120163
    – Ira Baxter
    Aug 3 '14 at 17:29








12




12




The "answer" provided here by HS is confusing and actualy doesn't address the question directly. This question actually has an answer here: stackoverflow.com/a/25106688/120163
– Ira Baxter
Aug 3 '14 at 17:29




The "answer" provided here by HS is confusing and actualy doesn't address the question directly. This question actually has an answer here: stackoverflow.com/a/25106688/120163
– Ira Baxter
Aug 3 '14 at 17:29












2 Answers
2






active

oldest

votes

















up vote
42
down vote



accepted










Well, first off, the grammar is used to construct a parse tree from an expression. So if you already have a parse tree, you don't need the grammar.



Depending on how much work your parser does, the resulting tree that is formed from parsing an expression could already be an abstract syntax tree. Or it could be a simple parse tree which requires a second pass to construct the ast.



To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code. Typically, you would split the work into a tokenizer which splits the input stream representing the expression into a list of tokens, and a parser which takes the list of tokens and constructs a parse treeast from it.



So the expression 1 + 2*(3+4) might be split into a list of tokens like this:



1 - int
+ - add_operator
2 - int
* - mul_operator
( - lparen
3 - int
+ - add_operator
4 - int
) - rparen


The first column is the actual text value. The second represents the token type. These tokens are fed into the parser, which is built from your grammar and recognizes the tokens and builds the parse tree.



So, how does one write the lexical tokenizer and the actual parser? You could roll your own by hand. Or, more commonly, use a parser generator like coco or antlr or lex/yacc. These tools take a description of your grammar and generate the code for a tokenzier and parser. (Code generators exist for most popular languages and some unpopular ones as well.)



How you construct your parser depends heavily on what language you use. How you would write a parser in Haskell is entirely different from how you would in, say, C.




  • Here's a tutorial that shows you how to build your own recursive
    descent parser.


  • Coco is a parser generator for various languages, which also
    comes with documentation on how to get started.


  • If Python's your thing, then pyparsing maybe for you.







share|improve this answer



















  • 29




    "To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code." This so confusing by itself that this answer should be removed. The rest of this "answer" doesn't really tell op how to build a syntax tree; it simply waves its hands at some tools that might be helpful, had the author actually answered the question.
    – Ira Baxter
    May 24 '14 at 17:08












  • please specify some pointers as to how you convert your grammar into working code
    – Rishul Matta
    Oct 30 '14 at 3:06


















up vote
3
down vote













I will answer this from a general perspective, without trying to talk about lexers and parsers.



A parse tree contains non-terminal symbols that are part of a context free grammar, and show the chain of productions to obtain a string consisting of terminal symbols, either recursively or not. So when you have the parse tree you don't need the grammar - you can derive the grammar from the parse tree.



An AST does not contain any non-terminal symbols. It only contains symbols.



Example:



 E
|
E + T
| |
T M * M
| | |
M a b
|
a


Which is a very quick version of showing a+a*b. Note, the way the abstract syntax tree is interpreted depends on the precedence of the tree, what type of traversal you do (in-order, pre-order, post-order) This would be a general function you code into your search tree. However, in general, the AST for that parse tree might look like this:



   +
| |
a *
| |
a b





share|improve this answer



















  • 1




    Cool, but how does one construct an AST?
    – ForceBru
    Jul 3 at 15:58











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1721553%2fhow-to-construct-an-abstract-syntax-tree%23new-answer', 'question_page');
}
);

Post as a guest
































2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
42
down vote



accepted










Well, first off, the grammar is used to construct a parse tree from an expression. So if you already have a parse tree, you don't need the grammar.



Depending on how much work your parser does, the resulting tree that is formed from parsing an expression could already be an abstract syntax tree. Or it could be a simple parse tree which requires a second pass to construct the ast.



To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code. Typically, you would split the work into a tokenizer which splits the input stream representing the expression into a list of tokens, and a parser which takes the list of tokens and constructs a parse treeast from it.



So the expression 1 + 2*(3+4) might be split into a list of tokens like this:



1 - int
+ - add_operator
2 - int
* - mul_operator
( - lparen
3 - int
+ - add_operator
4 - int
) - rparen


The first column is the actual text value. The second represents the token type. These tokens are fed into the parser, which is built from your grammar and recognizes the tokens and builds the parse tree.



So, how does one write the lexical tokenizer and the actual parser? You could roll your own by hand. Or, more commonly, use a parser generator like coco or antlr or lex/yacc. These tools take a description of your grammar and generate the code for a tokenzier and parser. (Code generators exist for most popular languages and some unpopular ones as well.)



How you construct your parser depends heavily on what language you use. How you would write a parser in Haskell is entirely different from how you would in, say, C.




  • Here's a tutorial that shows you how to build your own recursive
    descent parser.


  • Coco is a parser generator for various languages, which also
    comes with documentation on how to get started.


  • If Python's your thing, then pyparsing maybe for you.







share|improve this answer



















  • 29




    "To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code." This so confusing by itself that this answer should be removed. The rest of this "answer" doesn't really tell op how to build a syntax tree; it simply waves its hands at some tools that might be helpful, had the author actually answered the question.
    – Ira Baxter
    May 24 '14 at 17:08












  • please specify some pointers as to how you convert your grammar into working code
    – Rishul Matta
    Oct 30 '14 at 3:06















up vote
42
down vote



accepted










Well, first off, the grammar is used to construct a parse tree from an expression. So if you already have a parse tree, you don't need the grammar.



Depending on how much work your parser does, the resulting tree that is formed from parsing an expression could already be an abstract syntax tree. Or it could be a simple parse tree which requires a second pass to construct the ast.



To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code. Typically, you would split the work into a tokenizer which splits the input stream representing the expression into a list of tokens, and a parser which takes the list of tokens and constructs a parse treeast from it.



So the expression 1 + 2*(3+4) might be split into a list of tokens like this:



1 - int
+ - add_operator
2 - int
* - mul_operator
( - lparen
3 - int
+ - add_operator
4 - int
) - rparen


The first column is the actual text value. The second represents the token type. These tokens are fed into the parser, which is built from your grammar and recognizes the tokens and builds the parse tree.



So, how does one write the lexical tokenizer and the actual parser? You could roll your own by hand. Or, more commonly, use a parser generator like coco or antlr or lex/yacc. These tools take a description of your grammar and generate the code for a tokenzier and parser. (Code generators exist for most popular languages and some unpopular ones as well.)



How you construct your parser depends heavily on what language you use. How you would write a parser in Haskell is entirely different from how you would in, say, C.




  • Here's a tutorial that shows you how to build your own recursive
    descent parser.


  • Coco is a parser generator for various languages, which also
    comes with documentation on how to get started.


  • If Python's your thing, then pyparsing maybe for you.







share|improve this answer



















  • 29




    "To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code." This so confusing by itself that this answer should be removed. The rest of this "answer" doesn't really tell op how to build a syntax tree; it simply waves its hands at some tools that might be helpful, had the author actually answered the question.
    – Ira Baxter
    May 24 '14 at 17:08












  • please specify some pointers as to how you convert your grammar into working code
    – Rishul Matta
    Oct 30 '14 at 3:06













up vote
42
down vote



accepted







up vote
42
down vote



accepted






Well, first off, the grammar is used to construct a parse tree from an expression. So if you already have a parse tree, you don't need the grammar.



Depending on how much work your parser does, the resulting tree that is formed from parsing an expression could already be an abstract syntax tree. Or it could be a simple parse tree which requires a second pass to construct the ast.



To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code. Typically, you would split the work into a tokenizer which splits the input stream representing the expression into a list of tokens, and a parser which takes the list of tokens and constructs a parse treeast from it.



So the expression 1 + 2*(3+4) might be split into a list of tokens like this:



1 - int
+ - add_operator
2 - int
* - mul_operator
( - lparen
3 - int
+ - add_operator
4 - int
) - rparen


The first column is the actual text value. The second represents the token type. These tokens are fed into the parser, which is built from your grammar and recognizes the tokens and builds the parse tree.



So, how does one write the lexical tokenizer and the actual parser? You could roll your own by hand. Or, more commonly, use a parser generator like coco or antlr or lex/yacc. These tools take a description of your grammar and generate the code for a tokenzier and parser. (Code generators exist for most popular languages and some unpopular ones as well.)



How you construct your parser depends heavily on what language you use. How you would write a parser in Haskell is entirely different from how you would in, say, C.




  • Here's a tutorial that shows you how to build your own recursive
    descent parser.


  • Coco is a parser generator for various languages, which also
    comes with documentation on how to get started.


  • If Python's your thing, then pyparsing maybe for you.







share|improve this answer














Well, first off, the grammar is used to construct a parse tree from an expression. So if you already have a parse tree, you don't need the grammar.



Depending on how much work your parser does, the resulting tree that is formed from parsing an expression could already be an abstract syntax tree. Or it could be a simple parse tree which requires a second pass to construct the ast.



To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code. Typically, you would split the work into a tokenizer which splits the input stream representing the expression into a list of tokens, and a parser which takes the list of tokens and constructs a parse treeast from it.



So the expression 1 + 2*(3+4) might be split into a list of tokens like this:



1 - int
+ - add_operator
2 - int
* - mul_operator
( - lparen
3 - int
+ - add_operator
4 - int
) - rparen


The first column is the actual text value. The second represents the token type. These tokens are fed into the parser, which is built from your grammar and recognizes the tokens and builds the parse tree.



So, how does one write the lexical tokenizer and the actual parser? You could roll your own by hand. Or, more commonly, use a parser generator like coco or antlr or lex/yacc. These tools take a description of your grammar and generate the code for a tokenzier and parser. (Code generators exist for most popular languages and some unpopular ones as well.)



How you construct your parser depends heavily on what language you use. How you would write a parser in Haskell is entirely different from how you would in, say, C.




  • Here's a tutorial that shows you how to build your own recursive
    descent parser.


  • Coco is a parser generator for various languages, which also
    comes with documentation on how to get started.


  • If Python's your thing, then pyparsing maybe for you.








share|improve this answer














share|improve this answer



share|improve this answer








edited Oct 4 at 13:58









Floyd

384311




384311










answered Nov 12 '09 at 11:46









HS.

10.6k73346




10.6k73346








  • 29




    "To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code." This so confusing by itself that this answer should be removed. The rest of this "answer" doesn't really tell op how to build a syntax tree; it simply waves its hands at some tools that might be helpful, had the author actually answered the question.
    – Ira Baxter
    May 24 '14 at 17:08












  • please specify some pointers as to how you convert your grammar into working code
    – Rishul Matta
    Oct 30 '14 at 3:06














  • 29




    "To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code." This so confusing by itself that this answer should be removed. The rest of this "answer" doesn't really tell op how to build a syntax tree; it simply waves its hands at some tools that might be helpful, had the author actually answered the question.
    – Ira Baxter
    May 24 '14 at 17:08












  • please specify some pointers as to how you convert your grammar into working code
    – Rishul Matta
    Oct 30 '14 at 3:06








29




29




"To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code." This so confusing by itself that this answer should be removed. The rest of this "answer" doesn't really tell op how to build a syntax tree; it simply waves its hands at some tools that might be helpful, had the author actually answered the question.
– Ira Baxter
May 24 '14 at 17:08






"To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code." This so confusing by itself that this answer should be removed. The rest of this "answer" doesn't really tell op how to build a syntax tree; it simply waves its hands at some tools that might be helpful, had the author actually answered the question.
– Ira Baxter
May 24 '14 at 17:08














please specify some pointers as to how you convert your grammar into working code
– Rishul Matta
Oct 30 '14 at 3:06




please specify some pointers as to how you convert your grammar into working code
– Rishul Matta
Oct 30 '14 at 3:06












up vote
3
down vote













I will answer this from a general perspective, without trying to talk about lexers and parsers.



A parse tree contains non-terminal symbols that are part of a context free grammar, and show the chain of productions to obtain a string consisting of terminal symbols, either recursively or not. So when you have the parse tree you don't need the grammar - you can derive the grammar from the parse tree.



An AST does not contain any non-terminal symbols. It only contains symbols.



Example:



 E
|
E + T
| |
T M * M
| | |
M a b
|
a


Which is a very quick version of showing a+a*b. Note, the way the abstract syntax tree is interpreted depends on the precedence of the tree, what type of traversal you do (in-order, pre-order, post-order) This would be a general function you code into your search tree. However, in general, the AST for that parse tree might look like this:



   +
| |
a *
| |
a b





share|improve this answer



















  • 1




    Cool, but how does one construct an AST?
    – ForceBru
    Jul 3 at 15:58















up vote
3
down vote













I will answer this from a general perspective, without trying to talk about lexers and parsers.



A parse tree contains non-terminal symbols that are part of a context free grammar, and show the chain of productions to obtain a string consisting of terminal symbols, either recursively or not. So when you have the parse tree you don't need the grammar - you can derive the grammar from the parse tree.



An AST does not contain any non-terminal symbols. It only contains symbols.



Example:



 E
|
E + T
| |
T M * M
| | |
M a b
|
a


Which is a very quick version of showing a+a*b. Note, the way the abstract syntax tree is interpreted depends on the precedence of the tree, what type of traversal you do (in-order, pre-order, post-order) This would be a general function you code into your search tree. However, in general, the AST for that parse tree might look like this:



   +
| |
a *
| |
a b





share|improve this answer



















  • 1




    Cool, but how does one construct an AST?
    – ForceBru
    Jul 3 at 15:58













up vote
3
down vote










up vote
3
down vote









I will answer this from a general perspective, without trying to talk about lexers and parsers.



A parse tree contains non-terminal symbols that are part of a context free grammar, and show the chain of productions to obtain a string consisting of terminal symbols, either recursively or not. So when you have the parse tree you don't need the grammar - you can derive the grammar from the parse tree.



An AST does not contain any non-terminal symbols. It only contains symbols.



Example:



 E
|
E + T
| |
T M * M
| | |
M a b
|
a


Which is a very quick version of showing a+a*b. Note, the way the abstract syntax tree is interpreted depends on the precedence of the tree, what type of traversal you do (in-order, pre-order, post-order) This would be a general function you code into your search tree. However, in general, the AST for that parse tree might look like this:



   +
| |
a *
| |
a b





share|improve this answer














I will answer this from a general perspective, without trying to talk about lexers and parsers.



A parse tree contains non-terminal symbols that are part of a context free grammar, and show the chain of productions to obtain a string consisting of terminal symbols, either recursively or not. So when you have the parse tree you don't need the grammar - you can derive the grammar from the parse tree.



An AST does not contain any non-terminal symbols. It only contains symbols.



Example:



 E
|
E + T
| |
T M * M
| | |
M a b
|
a


Which is a very quick version of showing a+a*b. Note, the way the abstract syntax tree is interpreted depends on the precedence of the tree, what type of traversal you do (in-order, pre-order, post-order) This would be a general function you code into your search tree. However, in general, the AST for that parse tree might look like this:



   +
| |
a *
| |
a b






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 24 at 19:20









ThatsJustCheesy

466615




466615










answered May 9 '16 at 10:07









Dhruv Ghulati

1,56321630




1,56321630








  • 1




    Cool, but how does one construct an AST?
    – ForceBru
    Jul 3 at 15:58














  • 1




    Cool, but how does one construct an AST?
    – ForceBru
    Jul 3 at 15:58








1




1




Cool, but how does one construct an AST?
– ForceBru
Jul 3 at 15:58




Cool, but how does one construct an AST?
– ForceBru
Jul 3 at 15:58


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1721553%2fhow-to-construct-an-abstract-syntax-tree%23new-answer', 'question_page');
}
);

Post as a guest




















































































Popular posts from this blog

Guess what letter conforming each word

Run scheduled task as local user group (not BUILTIN)

Port of Spain