EBNF Grammar for Mini-Java

Goal = MainClass, { ClassDeclaration }, EOF;
MainClass = "class", Identifier, "{", "public", "static", "void", "main", "(", "String", "[", "]", Identifier, ")", "{", Statement, "}", "}";
ClassDeclaration = "class", Identifier, [ "extends", Identifier ], "{", { VarDeclaration }, { MethodDeclaration } "}";
VarDeclaration = Type, Identifier, ";";
MethodDeclaration = "public", Type, Identifier, "(", [ Type, Identifier, { ",", Type, Identifier }, ], ")", "{", { VarDeclaration }, { Statement }, "return", Expression, ";", "}";
Type = "int", "[", "]"
| "boolean"
| "int"
| Identifier
;
Statement = "{", { Statement }, "}"
| "if", "(", Expression, ")", Statement, "else", Statement
| "while", "(", Expression, ")", Statement
| "System.out.println", "(" , Expression, ")", ";"
| Identifier, "=", Expression, ";"
| Identifier, "[", Expression, "]", "=", Expression, ";"
;
Expression = Expression , ( "&&" | "<" | "+" | "-" | "*" ) , Expression
| Expression, "[", Expression, "]"
| Expression, ".", "length"
| Expression, ".", Identifier, "(", [ Expression { ",", Expression } ], ")"
| IntegerLiteral
| "true"
| "false"
| Identifier
| "this"
| "new", "int", "[", Expression, "]"
| "new", Identifier ,"(" ,")"
| "!", Expression
| "(", Expression, ")"
;
Identifier is one or more letters, digits, and underscores, starting with a letter
IntegerLiteral is one or more decimal digits
EOF is a distinguished token returned by the scanner at end-of-file

EBNF

ISO/IEC 14977: 1996(E)

MiniJava Character Set

A MiniJava program is a text file conisisting of US-ASCII characters. (This is different than Java which handle multiple character sets as input.)

No Java/Python unicode escapse \uXXXX in MiniJava, but are easy to include using a JavaCC options: JAVA_UNICODE_ESCAPE=true.

MiniJava tokens are separated by white space (SP, HT, or FF) or line terminators (LF, CR, or CR + LF) characters Like Java the iniput can end with the US-ASCII SUB chaacter, aslo known as "control-Z".

Java Comments

Comments are // to end of line and /* ... */, just as in Java. The /* ... */ comments do not nest in Java. For example,
/*
   One commment
   /*  Nested comment */
   Bad things will happen
*/
The second /* will be ignored (it is in a comment), and the first */ will terminate the comment. Now, "bad things will happen" as the remaining text is not a comment.

Appel, 2nd edition, page 484, describes comments in MiniJava as being nestable. This is an interesting exercise for the scanner, but is not correct.

Any (Unicode) character except NL and CR is legal in a Java comment. In Minijava we consider only Latin-1 characters, so only Latin-1 characters except NL and CR are legal in a MiniJava comment.

MiniJava is supposed to be a subset of Java. As far as practical, every illegal Java program should be an illegal Mini-Java program. For example,

class Main {
  void m() {
    int goto = 3; // 'goto' is a reserved word in Java
  }
}

A Mini-Java program with uninitilized variables ought to be illegal just as in Java. I don't think it is possible to write a Mini-Java program with an unreachable statement (which would be illegal in Java).

Java Keywords

Many Java keywords are not used in MiniJava. Consult the Java 13 language specification for details about Java keywords.

Relaxations from Java

In order to simplify the programming project, we make the following simplifications: (However, you may include these features, if you wish.)

A minimal Mini-Java compiler must, however, include the following features: