Two major-ish changes in ANTLR4 from ANTLR3:
See link for details on installing and setting up ANTLR4 in IntelliJ.
Simple E?BNF
\[\begin{array}{rcl} \textit{prog} & \rightarrow & \textit{statements} \\ \textit{statements} & \rightarrow & \textit{statement} \\ & & |\ \textit{statement}\ \ \textit{statements} \\ \textit{statement} & \rightarrow & \textit{expr} \end{array}\]
\[\begin{array}{rcl} \gnonterm{PROG} & \rightarrow{} & \gnonterm{STAT}*\ \gterm{eof} \\ \gnonterm{STAT} & \rightarrow{} & \gnonterm{EXPR}\ \gterm{newline} \\ & | & \gterm{id}\ \gliteral{=}\ \gnonterm{EXPR}\ \gterm{newline} \\ & | & \gterm{newline} \\ \gnonterm{EXPR} & \rightarrow{} & \gnonterm{MEXPR}\ ((\gliteral{+}\ |\ \gliteral{-})\ \gnonterm{MEXPR})* \\ \gnonterm{MEXPR} & \rightarrow{} & \gnonterm{ATOM}\ (\gliteral{*}\ \gnonterm{ATOM})* \\ \gnonterm{ATOM} & \rightarrow{} & \gterm{int}\ |\ \gterm{id}\ |\ \gliteral{(}\ \gnonterm{EXPR}\ \gliteral{)} \\ \gterm{id} & \rightarrow{} & (\gliteral{a}\ |\ \gliteral{b}\ |\ ...\ |\ \gliteral{z}\ |\ \gliteral{A}\ |\ ...\ |\ \gliteral{Z})+ \\ \gterm{int} & \rightarrow{} & (\gliteral{0}\ |\ ...\ |\ \gliteral{9})+ \\ \gterm{newline} & \rightarrow{} & \gliteral{\r}?\ \gliteral{\n} \\ \gterm{ws} & \rightarrow{} & (\gliteral{ }\ |\ \gliteral{\t})+\ \ \gcmd{/* skip */} \\ \end{array}\]
grammar Expr; // note: must be same as filename (Expr.g4) prog: stat+ ; stat: expr NEWLINE | ID '=' expr NEWLINE | NEWLINE ; expr : multExpr (('+' | '-') multExpr)* ; multExpr : atom ('*' atom)* ; atom: INT | ID | '(' expr ')' ; ID : [a-zA-Z]+ ; INT : [0-9]+ ; NEWLINE : '\r'? '\n' ; WS : [ \t]+ -> skip ; // tells ANTLR to ignore these
ANTLR can turn your grammar file into Java lexer and parser (with additional machinery) by simply right-clicking on the Expr.g4
file and clicking on Generate ANTLR Recognizer
.
ANTLR plugin for IntelliJ generates the following files in the gen/
folder:
Expr.tokens
(ExprLexer.tokens
)ExprLexer.java
, ExprParser.java
ExprListener.java
, ExprVisitor.java
ExprBaseListener.java
, ExprBaseVisitor.java
These files are not considered source to IntelliJ, though.
Either move the Java files to the src/
folder or mark the gen/
folder a Sources Root.
prog: stat+ ;
public class ExprParser extends DebugParser { public final void prog() throws RecognitionException { // ^^^^ - start symbol! try { do { stat(); } while(/*...*/); } catch(RecognitionException re) { reportError(re); recover(input, re); } } }
Note: Java has been simplified for presentation
multExpr: atom ('*' atom)* ;
public class ExprParser extends DebugParser { void multExpr() { try { atom(); while (next symbol is "*") { match('*'); atom(); } } catch (RecognitionException re) { reportError(re); recover(input, re); } } }
Note: Java has been simplified for presentation
A simple test program
import org.antlr.v4.runtime.*; public class Test { public static void main(String[] args) throws Exception { CharStream input = CharStreams.fromFileName("inputs/test.txt"); ExprLexer lexer = new ExprLexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); ExprParser parser = new ExprParser(tokens); parser.prog(); // parse the input stream! }; }
Create a folder in your IntelliJ program called inputs
, and then create a file in it called test.txt
.
Mark the inputs
folder as Resources Root
2 + 3 x = 1 + 2 * 3 + (a)
The parser builds a token stream from the input and checks the syntax based on the supplied grammar.
To create a translator or interpreter from the recognizer, the meaning of the valid token stream must be expressed.
Two approaches:
visit
method.The Language Implementation Patterns and The Definitive ANTLR4 Reference books talk about implementing listeners and visitors.
Here is an ANTLR4 example of a calculator on GitHub.
https://github.com/shmatov/antlr4-calculator
In this example, the input is parsed, and then a "visitor" will traverse the parse tree to evaluate the expressions.