Skip to content
Ken Domino edited this page Apr 29, 2024 · 25 revisions

Welcome to the grammars-v4

The grammars-v4 repository is a collection of ANTLR4 grammars contributed by authors around the world. Grammars-v4 uses trgen, antlr4test-maven-plugin, a number of scripts in the _scripts directory, and Github Actions to ensure that all grammars on the tree build and parse input files properly with ANTLR4.

Each grammar has a directory of examples, which contains input files and the expected output from the parse (parse errors contained in .errors files; parse tree of the input contained in .tree files). Testing is performed across: Ubuntu, macOS, and Windows operating systems; Cpp (C++), CSharp (C#), Dart (Dart2), Go, Java, JavaScript, PHP, and Python3 targets; Bash and Powershell environments.

A core value of grammars-v4 is that any grammar downloaded from grammars-v4 will compile properly with ANTLR4, and has been validated against some example inputs.

FAQ

What are the licensing terms for Grammars-v4?

There is no single license for the grammars; each grammar has its own license. Check inside the grammar files for licensing terms.

There is no grammar for the language or file format I need. What do I do?

You are welcome to submit an issue ticket, and contributions to the grammars tree are also welcome.

If you add a grammar, you should add a desc.xml, an examples/ directory to test it, and a readme.md to document the grammar.

What is required to submit a grammar?

  • You need to place the grammar in a directory that is appropriately named.
  • In that directory (aka "the root directory for the grammar"), add .g4's, desc.xml, examples in directory examples/. Please include a readme.md with notes on the source for the grammar, version information, copyrights, authorship, etc.
  • You can make the grammar combined (one .g4) or split (two .g4's). If combined, the grammar and file name must be identical. Do not add "Parser" to the name for a combined grammar. If Split, the name of the lexer must end in "Lexer" and the parser end in "Parser".
  • Actions or semantic predicates are ok if necessary for defining syntax. It is best if you use "target agnostic format".
  • Make sure you have tested the grammar for Java.

My PR was rejected! Why?!

If your PR breaks the existing tests, it will be rejected. Additionally, we ask that any incremental changes made to grammar files have examples contributed to the /examples directory for that grammar to ensure that future changes to the grammars don't introduce regressions.

Are there examples of how to use the grammars?

Look here

Is there a coding standard for ANTLR4 grammars?

All grammars in the repository are formatted according to common rules. Formatting is tested for each PR and if that fails the PR is rejected. The tool used to format an ANTLR4 grammar is antlr-format. You need Node.js installed to run it.

All grammars in the repository contain formatting options that mirror the common rules. These options must not be changed in a PR, unless the maintainers change these rules and reformat the entire repository again.

New grammars usually do not contain these formatting options. You can either copy them from an existing grammar or let the antlr-format tool add them for you. Consult the readme of the antlr-format terminal tool how to run it and read the Configuration section for details of the config file to use, to prepare your new grammar for a PR. Existing grammars don't need a config file (they have all options as comments), which look like:

// $antlr-format alignTrailingComments true, columnLimit 150, useTab false ...

How can I use ANTLR4 to parse binary files?

There is an example at /tcpheader/

How can I download grammars from the github page in a maven build?

Use download-maven-plugin

<plugin>
	<groupId>com.googlecode.maven-download-plugin</groupId>
	<artifactId>download-maven-plugin</artifactId>
	<version>1.4.0</version>
	<executions>
		<execution>
			<phase>generate-sources</phase>
			<goals>
				<goal>wget</goal>
			</goals>
			<configuration>
				<url>https://raw.githubusercontent.com/antlr/grammars-v4/master/arithmetic/arithmetic.g4</url>
				<outputFileName>arithmetic.g4</outputFileName>
				<outputDirectory>src/main/antlr4/com/khubla/antlr4example/</outputDirectory>
			</configuration>
		</execution>
	</executions>
</plugin>

How do I test the grammars?

Using Trgen (all targets)

  1. Install dotnet version 8.
  2. Install "antlr4-tools". pip install antlr4-tools. See https://github.com/antlr/antlr4-tools
  3. Install target-specific support, e.g., G++, Dart, Go, etc.
  4. Install the Trash toolkit installed. See the documentation.
  5. git clone https://github.com/antlr/grammars-v4.git
  6. cd grammars-v4/<grammar-of-your-choice>. E.g., cd grammars-v4/java/java.
  7. trgen. This will create a driver for all implemented targets that work with the grammar. See the desc.xml file for this list.
  8. cd Generated-<target-of-your-choice>. E.g., cd Generated-CSharp.
  9. In a Bash prompt, type make; make test. Or, in a Powershell prompt, type pwsh build.ps1; pwsh test.ps1. The scripts create temporary files used in the build. Use git clean -f to remove these files.
  10. Tests create .errors and .tree files automatically. If you want, you can check these in for testing across targets and OSes.

Using Maven (Java only)

  1. Clone grammars-v4. git clone https://github.com/antlr/grammars-v4.git
  2. Make sure you have Maven installed. See the documentation.
  3. cd grammars-v4 (the root directory), or to a grammar cd grammars-v4/java/java.
  4. Execute mvn clean test.