ANTLR Tutorial

COS 382 - Language Structures

notes

Two major-ish changes in ANTLR4 from ANTLR3:

Accepts nearly any grammar you throw at it, even with left-recursion. (trivia: this version is named honey badger)
Discourages the use of actions directly within grammar. Instead, use listeners and visitors.

See link for details on installing and setting up ANTLR4 in IntelliJ.

Grammar—Arithmetic expressions

Simple E?BNF

\[\begin{array}{rcl} \textit{prog} & \rightarrow & \textit{statements} \\ \textit{statements} & \rightarrow & \textit{statement} \\ & & |\ \textit{statement}\ \ \textit{statements} \\ \textit{statement} & \rightarrow & \textit{expr} \end{array}\]

arithmetic expressions grammar

\[\begin{array}{rcl} \gnonterm{PROG} & \rightarrow{} & \gnonterm{STAT}*\ \gterm{eof} \\ \gnonterm{STAT} & \rightarrow{} & \gnonterm{EXPR}\ \gterm{newline} \\ & | & \gterm{id}\ \gliteral{=}\ \gnonterm{EXPR}\ \gterm{newline} \\ & | & \gterm{newline} \\ \gnonterm{EXPR} & \rightarrow{} & \gnonterm{MEXPR}\ ((\gliteral{+}\ |\ \gliteral{-})\ \gnonterm{MEXPR})* \\ \gnonterm{MEXPR} & \rightarrow{} & \gnonterm{ATOM}\ (\gliteral{*}\ \gnonterm{ATOM})* \\ \gnonterm{ATOM} & \rightarrow{} & \gterm{int}\ |\ \gterm{id}\ |\ \gliteral{(}\ \gnonterm{EXPR}\ \gliteral{)} \\ \gterm{id} & \rightarrow{} & (\gliteral{a}\ |\ \gliteral{b}\ |\ ...\ |\ \gliteral{z}\ |\ \gliteral{A}\ |\ ...\ |\ \gliteral{Z})+ \\ \gterm{int} & \rightarrow{} & (\gliteral{0}\ |\ ...\ |\ \gliteral{9})+ \\ \gterm{newline} & \rightarrow{} & \gliteral{\r}?\ \gliteral{\n} \\ \gterm{ws} & \rightarrow{} & (\gliteral{ }\ |\ \gliteral{\t})+\ \ \gcmd{/* skip */} \\ \end{array}\]

antlr grammar

grammar Expr;   // note: must be same as filename (Expr.g4)

prog: stat+ ;

stat: expr NEWLINE
    | ID '=' expr NEWLINE
    | NEWLINE
    ;

expr     : multExpr (('+' | '-') multExpr)* ;
multExpr : atom ('*' atom)* ;
atom: INT
    | ID
    | '(' expr ')'
    ;

ID      : [a-zA-Z]+ ;
INT     : [0-9]+ ;
NEWLINE : '\r'? '\n' ;
WS      : [ \t]+ -> skip ;  // tells ANTLR to ignore these

Antlr plugin for intellij

Antlr plugin for intellij

Antlr plugin for intellij

Antlr plugin for intellij

ANTLR can turn your grammar file into Java lexer and parser (with additional machinery) by simply right-clicking on the Expr.g4 file and clicking on Generate ANTLR Recognizer.

Antlr plugin for intellij

ANTLR plugin for IntelliJ generates the following files in the gen/ folder:

Expr.tokens (ExprLexer.tokens)
ExprLexer.java, ExprParser.java
ExprListener.java, ExprVisitor.java
ExprBaseListener.java, ExprBaseVisitor.java

These files are not considered source to IntelliJ, though. Either move the Java files to the src/ folder or mark the gen/ folder a Sources Root.

Antlr plugin for intellij

exprparser.java psuedocode

prog: stat+ ;

public class ExprParser extends DebugParser {
    public final void prog() throws RecognitionException {
        //            ^^^^ - start symbol!
        try {
            do {
                stat();
            } while(/*...*/);
        } catch(RecognitionException re) {
            reportError(re);
            recover(input, re);
        }
    }
}

Note: Java has been simplified for presentation

exprparser.java psuedocode

multExpr: atom ('*' atom)* ;

public class ExprParser extends DebugParser {
    void multExpr() {
        try {
            atom();
            while (next symbol is "*") {
                match('*');
                atom();
            }
        } catch (RecognitionException re) {
            reportError(re);
            recover(input, re);
        }
    }
}

Note: Java has been simplified for presentation

test program

A simple test program

import org.antlr.v4.runtime.*;

public class Test {
    public static void main(String[] args) throws Exception {
        CharStream input = CharStreams.fromFileName("inputs/test.txt");
        ExprLexer lexer = new ExprLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        ExprParser parser = new ExprParser(tokens);

        parser.prog(); // parse the input stream!
    };
}

test file

Create a folder in your IntelliJ program called inputs, and then create a file in it called test.txt. Mark the inputs folder as Resources Root

2 + 3
x = 1 + 2 * 3 + (a)

evaluating the expression

The parser builds a token stream from the input and checks the syntax based on the supplied grammar.

To create a translator or interpreter from the recognizer, the meaning of the valid token stream must be expressed.

Two approaches:

Use ANTLR Listener
Use ANTLR Visitor

listener vs visitor

Listener: interface that responds to events triggered by the built-in tree walker. Callbacks are called when rules are entered (pre-order traversal) or exited (post-order) while traversal the entire tree.
Visitor: interface that allows tree walking to be controlled. Children in tree are visited by explicitly calling visit method.

The Language Implementation Patterns and The Definitive ANTLR4 Reference books talk about implementing listeners and visitors.

calculator example

Here is an ANTLR4 example of a calculator on GitHub.

https://github.com/shmatov/antlr4-calculator

In this example, the input is parsed, and then a "visitor" will traverse the parse tree to evaluate the expressions.