In a series of previous articles, we learnt about automated unit test generation using search-based and property-based methods. We also looked at Pathgrind, a tool for dynamic symbolic execution that can be used for automated fuzzing of binaries. Continuing on the same theme, in this article we will look at how grammar-based test case generation works in practice. We also present a new tool - Gramtest. Gramtest allows you to generate test cases based on arbitrary user defined grammars. Potential applications of the tool include automated fuzzing and testing.
Several programs (like parsers, interpreters and compilers) that work on structured input can be tested using grammars. These applications process their input in different stages like tokenizing, building parse tree, converting to AST and evaluating the AST. For such applications, due to the large number of control flow paths in the early processing, random fuzzing does not yield good test cases. Generating tests that exploit the structured nature of the input can provide better results. The simplest way to provide specify the input is in from of context-free grammars.
Context-free grammar
A context-free grammar or CFG is a set of recursive rewriting rules (also called productions) used to generate patterns of strings. As an example, consider the following CFG for arithmetic expressions.
<expression> ::= <term> <addOps> <expression> | <term>
<term> ::= <factor> <multOps> <term> | <factor>
<addOps> ::= + | -
<multOps> ::= * | /
<factor> ::= "(" <expression> ")" | <constant>
<constant> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
The above grammar captures the language of all strings using four operators ("+","-","","/") brackets ("(",")") and numbers (0-9). The set of symbols that can appear in the strings generated by the grammar are called terminals. We can generate all the strings in the grammar by following the production rules. E.g. for generating the string "(1 + 2) 3", we can apply the following rules:
<expression> ::= <term>
<term> ::= <factor> <multiOps> <term>
<factor> ::= "(" <expression> ")"
<expression> ::= <term> <addOps> <expression>
<term> ::= <factor>
<factor> ::= <constant>
<constant> ::= 1
<addOps> ::= +
<expression> ::= <term>
<term> ::= <factor>
<factor> ::= <constant>
<constant> ::= 2
<multiOps> ::= *
<term> ::= <factor>
<factor> ::= <constant>
<constant> ::= 3
Each string in the grammar starts at the first symbol and then follows the production rules till it reaches a terminal symbol. The rules of the grammar as given above are said to be in Backus-Naur Form (or BNF). It is one of two main notation techniques used for representing context-free grammars. Once we have specified the input to a program in BNF, we can do test case generation by exhaustively applying all the production rules to generate strings. We present Gramtest a tool written in Java that can be used for this purpose.
Gramtest
Gramtest is implemented using the ANTLR4 parser generator. To specify the structure of the inputs used to generate tests we use the BNF grammar available from the ANTLR repository. In addition, there are some useful Maven plugins that we use for our development while working with the grammars. The BNF grammar allows us to recognize any language given in the BNF format. The syntax for BNF can itself be represented with a BNF as follows:
<syntax> ::= <rule> | <rule> <syntax>
<rule> ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace>
"::=" <opt-whitespace> <expression> <line-end>
<opt-whitespace> ::= " " <opt-whitespace> | ""
<expression> ::= <list> | <list> "|" <expression>
<line-end> ::= <opt-whitespace> <EOL> | <line-end> <line-end>
<list> ::= <term> | <term> <opt-whitespace> <list>
<term> ::= <literal> | "<" <rule-name> ">"
<literal> ::= '"' <text> '"' | "'" <text> "'"
The grammar for arithmetic expressions given in previous section fits in this syntax. To generate tests from a given BNF grammar we need to exhaustively enumerate all the strings in the grammar. The Gramtest tool makes it easy to do just that. To run the tool and generate test cases for the arithmetic expressions grammar, we just run the following on the command line:
Asankhayas-MacBook-Pro:target asankhaya$ java -jar gramtest-0.1-SNAPSHOT-jar-with-dependencies.jar
-file ../src/test/resources/arithexp.bnf
Generating tests ...
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+0
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+1
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+2
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+3
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+4
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+5
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+6
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+7
(0*0+0)*(0)*0*0*0+(0)*0*0*0+0*0*0+0*0+8
...
The "-file" command tells Gramtest to look for the input grammar in the file "arithexp.bnf". By default the generated tests are printed on the screen. In case you want to save them to a folder to use with your program you can use the "-tests" option as follows:
Asankhayas-MacBook-Pro:target asankhaya$ java -jar gramtest-0.1-SNAPSHOT-jar-with-dependencies.jar
-file ../src/test/resources/arithexp.bnf -tests generated-tests
Generating tests ...
All tests have been saved in the generated-tests folder!
This will save all the test cases in the "generated-tests" folder.
Asankhayas-MacBook-Pro:target asankhaya$ ls generated-tests/
1.txt 17.txt 25.txt 33.txt 41.txt 5.txt 58.txt 66.txt 74.txt 82.txt 90.txt 99.txt
10.txt 18.txt 26.txt 34.txt 42.txt 50.txt 59.txt 67.txt 75.txt 83.txt 91.txt
100.txt 19.txt 27.txt 35.txt 43.txt 51.txt 6.txt 68.txt 76.txt 84.txt 92.txt
11.txt 2.txt 28.txt 36.txt 44.txt 52.txt 60.txt 69.txt 77.txt 85.txt 93.txt
...
The test cases can then be run with the target program for fuzzing and automated testing. As an another example, lets consider a BNF grammar for generating all strings that have the word "main" in them.
<program> ::= m a i n
::= { }
::= A | B | C | D | E | F | G | H | I | J | K | L | M | N |
O | P | Q | R | S | T | U | V | W | X | Y | Z |
a | b | c | d | e | f | g | h | i | j | k | l | m | n |
o | p | q | r | s | t | u | v | w | x | y | z
The above BNF grammar uses the curly brackets ("{", "}") construct in BNF to apply the production rule, zero or more times. Running Gramtest with this grammar as input produces strings that contain the world "main" somewhere in them.
Asankhayas-MacBook-Pro:target asankhaya$ java -jar gramtest-0.1-SNAPSHOT-jar-with-dependencies.jar
-file ../src/test/resources/main.bnf
Generating tests ...
AAAmainAAA
AAAmainAAB
AAAmainAAC
AAAmainAAD
AAAmainAAE
AAAmainAAF
AAAmainAAG
AAAmainAAH
AAAmainAAI
AAAmainAAJ
...
In addition to the special curly bracket symbols, Gramtest also supports the square brackets ("[", "]") for specifying an optional production rule. While, the parentheses ("(", ")") are used for repeating the rule one or more times. For details on the syntax support please refer to the BNF ANTLR4 grammar that is included with the sources of Gramtest. Hopefully, by now you are convinced that Gramtest is a useful tool to generate test cases from arbitrary user defined grammars.
We will look at some of the details behind the implementation of the tool in a future article. Meanwhile, do let us know your comments on grammar-based testing and please feel free to contribute to the tool by forking it on Github.