Skip to content

LL1Checker: A tool to verify if a grammar is LL(1) and to validate input strings against the generated language. Ideal for learning about parsing techniques, compiler design, and formal language theory. Try it out or contribute to improve its functionality!

License

Notifications You must be signed in to change notification settings

jose-rZM/LL1Checker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LL1Checker

LL1Checker is a tool for verifying if a grammar is LL(1) and for validating whether an input string belongs to the language generated by the grammar. The grammar is read from a file.

πŸ“– Table of Contents

  1. Project Status
  2. What's New in v5.0
  3. Run the Program
  4. Dependencies
  5. Considerations
  6. Structure of grammar.txt
  7. Want to Contribute?
  8. Compilation
  9. Documentation

πŸš€ Project Status

This project is complete, with all essential functionalities implemented and tested with various grammars. It is now in maintenance mode for potential enhancements or bug fixes. If you encounter any issues or unexpected interpretations, please open an issue and include relevant details. Contributions for further improvement are welcome!

πŸ†• What's New in v5.0

  • The lexer, parser, and symbol table now work with numeric token identifiers, improving table lookups and reducing memory usage for large grammars.
  • Boost.Spirit powers the dynamic lexer and the parser now mirrors its diagnostics, so both lexing and parsing errors display multi-line windows with a caret pointing at the failing token.
  • Parse sessions can export a full parse tree to Graphviz-compatible .dot files via --export-tree, making it easy to visualize derivations.
  • Grammar validation is stricter: duplicate terminal or non-terminal identifiers abort loading with a descriptive error, and parse tree exports now fail fast if the output file cannot be created.
  • Command-line handling exits immediately on invalid options, preventing partial runs when arguments are malformed.
  • The build system gained an opt-in static linking toggle (LL1CHECKER_FORCE_STATIC_RUNTIME) and a make static helper target; CMake will also fetch cxxopts automatically when it is not installed.

▢️ Run

You can run the program as follows:

./ll1 <GRAMMAR_FILENAME> [TEXT_FILENAME] [OPTIONS]

Positional Arguments:

  • <GRAMMAR_FILENAME>: Path to the file containing the grammar.
  • [TEXT_FILENAME] (optional): If provided, the program will validate whether the input string belongs to the language defined by the grammar.

Options:

  • -h, --help: Show help message.
  • -v, --verbose: Enable verbose mode, displaying the LL(1) table and input content.
  • --format <FORMAT>: Specify the table format (old or new).
    • If set, verbose mode is enabled automatically.
    • The default format is "new".
  • --text <STRING>: Directly parse the provided text instead of reading a file.
  • --export-tree <FILE>: Write the parse tree to a Graphviz .dot file when parsing succeeds.

Examples:

Checking if a grammar is LL(1)

./ll1 grammar.txt
  • Verifies if the provided grammar is LL(1).

No LL1 grammars

If the grammar provided is not LL1, an error will be displayed alongside its table:

Non-LL1 grammar error screenshot

Checking if an input string belongs to the grammar

./ll1 grammar.txt input.txt
  • Verifies if the grammar is LL(1).
  • Parses the input.txt file according to the grammar.

Parsing text from the command line

./ll1 grammar.txt --text "aa"
  • Parses "aa" according to the grammar without creating a file.

Enabling verbose mode

./ll1 grammar.txt input.txt -v

Verbose parsing screenshot

  • Displays the entire LL(1) table.
  • Prints the contents of input.txt before parsing.

Specifying the table format

./ll1 grammar.txt -v --format old

Legacy LL1 table screenshot

  • Use the old table format when the new format cannot be displayed correctly due to screen size.

NEW! Reporting lexing errors

  • When a lexer error occurs, the program pinpoints the location:

Lexer error with caret indicator

NEW! Reporting parsing errors

  • When a parse error occurs, the program highlights the offending token:

Parse error with caret indicator

NEW! Export tree as .dot file

  • Export the parse tree to a Graphviz .dot file with --export-tree.
  • Run ./ll1 grammar.txt input.txt --export-tree tree.dot to generate tree.dot.
  • Convert the output with dot -Tpng tree.dot -o output.png to obtain an image.
./ll1 examples/grammar_3.txt examples/input_3.txt --export-tree tree.dot
dot -Tpng tree.dot -o output.png

Parse tree rendered from dot output

Error Handling: If <GRAMMAR_FILENAME> or <TEXT_FILENAME> do not exist or cannot be opened, the program prints an error and exits.

πŸ“¦ Dependencies

  • C++20 capable compiler and CMake β‰₯ 3.16.
  • Boost headers (tested with Boost 1.78+); required for the lexer built on Boost.Spirit.
  • cxxopts for command-line parsing. CMake will reuse an existing installation or automatically fetch v3.1.1 during configuration.

πŸ“Œ Considerations

  • Avoid using the reserved symbols <<EPSILON>> and <<EOF>>.
  • For terminal symbols, note that order matters. If two regexes have common elements, place the more specific one first, as in the example:
terminal WH "while";
terminal WORD [a-zA-Z][a-zA-Z]*;

πŸ“„ Structure of grammar.txt

The grammar file has two sections separated by ;: symbol definition and grammar definition.

Symbol definition

The terminal symbols follow the structure terminal <IDENTIFIER> <REGEX>; (just like declaring a variable). The <IDENTIFIER> should adhere to the regex pattern [a-zA-Z_\'][a-zA-Z_\'0-9]*. An example of the first section would be:

terminal a a;
start with S;
;

You should write the last line to designate S as the axiom. The grammar is augmented internally with a new non-terminal whose production is S' -> S <<EOF>>.

Grammar definition

The grammar follows this structure:

S -> A$;
A -> aaA;
A ->;
;

The line A->; represents an empty production. So, our grammar.txt would be:

terminal a a;
start with S;
;
S -> A$;
A -> aaA;
A ->;
;

This grammar generates the following language: L(G) = {aa, aaaa, aaaaaa, ...}, that is, a language with an even number of 'a'. Place the string you want to validate in input.txt.

πŸ—‚οΈ Project Structure

The repository follows a simple layout keeping sources and headers separated:

LL1Checker/
β”œβ”€β”€ CMakeLists.txt
β”œβ”€β”€ Makefile         # Helper Makefile to speed up compilation or formatting
β”œβ”€β”€ src/             # Source files and CMake logic
β”œβ”€β”€ include/         # Header files
β”œβ”€β”€ app/             # Main file
β”œβ”€β”€ examples/        # Example grammars and inputs
└── docs/            # Generated documentation

🀝 Want to Contribute?

To get started, you'll need the following:

  • A working C++20 toolchain plus the dependencies listed above.
  • Familiarity with CMake; configuration will fetch cxxopts automatically if it is not already available on your system.

Feel free to reach out if you have any questions or suggestions! 😊

πŸ› οΈ Compilation

To build the project using CMake:

cmake -S . -B build
cmake --build build

Alternatively, you can simply run:

make

This will configure and compile the project using the default Debug build type. To compile in Release mode:

make BUILD_TYPE=Release

To produce a static Release build (when your toolchain supports it):

make static
# or make BUILD_TYPE=Release STATIC=1 build

This passes -DLL1CHECKER_FORCE_STATIC_RUNTIME=ON to CMake and links the C++ runtime statically where available.

The ll1 executable will be located at build/app/ll1.

πŸ“š Documentation

The complete API documentation is available here: LL1Checker Documentation

About

LL1Checker: A tool to verify if a grammar is LL(1) and to validate input strings against the generated language. Ideal for learning about parsing techniques, compiler design, and formal language theory. Try it out or contribute to improve its functionality!

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages