Recursive Descent Parser

Code
Main Pgm
Parser Defs and Impl
Scanner Defs and Impl
Utilities
Source Pkg

Here is the source code for a simple example of a recursive descent parser. It parses a simple grammar similar to an assignment statement.

Asst → Id := Expr ;

Expr → Term | { [ + | - ] Term }*

Term → Fact | { [ * | / ] Fact }*

Fact → Id | Lit | ( Expr )

The parens are tokens, along with +, -, *, Id and Lit. An Id is made of letters and digits, starting with a letter, and a Lit is made of digits.

When run with a legal assignment statement for input, it prints a representation of the structure of the document. For instance:

[bennet@desktop simparse]$ ./parser mike := 2 * ( fred + 8 - 10*joe) + 1 ; [ Asst: [ Variable: tok_id(mike)] [ Binary, op = tok_plus(+): [ Binary, op = tok_splat(*): [ Value: tok_lit(2)] [ Binary, op = tok_minus(-): [ Binary, op = tok_plus(+): [ Variable: tok_id(fred)] [ Value: tok_lit(8)] ] [ Binary, op = tok_splat(*): [ Value: tok_lit(10)] [ Variable: tok_id(joe)] ] ] ] [ Value: tok_lit(1)] ] ]

Note: When providing input from the keyboard as above, be sure to end with the EOF marker (usually ^D in Unix, ^Z on Windows).

The Scanner object (hdr, imple) breaks the input up into tokens. The Parser object contains a Scanner object which provides it with the next input symbol. Tokens are either names or integers, or a relevant operator or punctuation symbol.

The Parser object (hdr, imple) has a matcher method for each non-terminal symbol in the grammar. A matcher method consumes the portion of the input containing the symbol it is matching, and leaves the scanner pointing to the following token. It then returns an object representing the parse tree rooted at the symbol. If the matcher does not find its symbol in the input, it throws and exception.

Each matcher is implemented by looking for each right-hand side defined for its symbol. The fact function checks for each of its three alternatives listed in the Fact rule, and returns the one it finds. The expr function finds as many Terms as it can, separated by + or -, and returns them in tree form.

The matchers call each other as specified in the grammar rules. The function to parse the input simply calls the matcher for Asst.