An Introduction To Scala Parser Combinators - Part 3: Writing unit tests for parsers
Jan 29, 2011 16:22 · 1067 words · 6 minutes read
Today I will describe how to test the basic literal and variable parsers we wrote in the last part of this series. Parser development is a great candidate for TDD, as it is pretty clear how to write the tests. Writing your tests in parallel when you develop your parser will speed up your work significantly.Should you want to tune the performance of your parser later, good test coverage is essential so you don’t break it during refactoring. In my examples here I use scalatest, but the principles outlined here work for any testing framework.
How to call the parser in tests
Our parsers are always members of a trait, so we will not create an instance of a parser object for testing but instead mix the parser trait right into our test class:
class ExpressionParsersTest extends ExpressionParsers with FlatSpec with ShouldMatchers {
}
We will write a little utility function to execute a parser so the tests stay readable:
private def parsing[T](s:String)(implicit p:Parser[T]):T = {
//wrap the parser in the phrase parse to make sure all input is consumed
val phraseParser = phrase(p)
//we need to wrap the string in a reader so our parser can digest it
val input = new CharSequenceReader(s)
phraseParser(input) match {
case Success(t,_) => t
case NoSuccess(msg,_) => throw new IllegalArgumentException(
"Could not parse '" + s + "': " + msg)
}
}
I chose the name parsing
as it mixes well with the should matcher DSL. There are a few peculiarities about that method:
The parser we pass in to test is wrapped in another parser called
phrase
. This parser makes sure that all input is consumed. Otherwise testing theboolean
parser would seem to succeed when parsing"truefoo"
, as it would consume"true"
and return a success with the rest of the input. Enforcing the parsing of all input in tests is a matter of taste, I found it is the behavior I wanted in my tests.The parser passed to the function for testing is marked
implicit
. While I thinkimplicit
should be used rarely and with great care, we will see that it is a fitting use here and will make the tests better readable.We just turn any non-successful parse result into an exception which will fail our test.
Often one needs to assert that a parser does not accept a certain input. To reduce clutter, we add another helper for that:
private def assertFail[T](input:String)(implicit p:Parser[T]) {
evaluating(parsing(input)(p)) should produce[IllegalArgumentException]
}
The tests for literals & variables
Now let’s test our literals. With our new helpers, this is fast and easy, let us try booleans first:
"The ExpressionParsers" should "parse boolean literals" in {
//just declare the parser to test once and mark it implicit
//that way our test functions will use it automagically
implicit val parserToTest = boolean
parsing("true") should equal(BooleanLiteral(true))
parsing("false") should equal(BooleanLiteral(false))
assertFail("True")
assertFail("False")
assertFail("TRUE")
assertFail("FALSE")
assertFail("truefoo")
}
We see here that it is important that the result of the parser provides proper equals
and hashCode
implementations, so we can compare the results with our expeceted values. You should try and use case classes for your parse results where possible. Now let’s make sure our numbers work:
//scalatest does not provide a "they" and
//as it is ExpressionParserS, not ExpressionParser
//I like to make the test specs look nicer...
val they = it
they should "parse floating point numbers" in {
implicit val parserToTest = double
parsing("0.0") should equal (NumberLiteral(0.0))
parsing("1.0") should equal (NumberLiteral(1.0))
parsing("-1.0") should equal (NumberLiteral(-1.0))
parsing("0.2") should equal (NumberLiteral(0.2))
parsing("-0.2") should equal (NumberLiteral(-0.2))
parsing(".2") should equal (NumberLiteral(.2))
parsing("-.2") should equal (NumberLiteral(-.2))
parsing("2.0e3") should equal (NumberLiteral(2000.0))
parsing("2.0E3") should equal (NumberLiteral(2000.0))
parsing("-2.0e3") should equal (NumberLiteral(-2000.0))
parsing("-2.0E3") should equal (NumberLiteral(-2000.0))
parsing("2.0e-3") should equal (NumberLiteral(0.002))
parsing("2.0E-3") should equal (NumberLiteral(0.002))
parsing("-2.0e-3") should equal (NumberLiteral(-0.002))
parsing("-2.0E-3") should equal (NumberLiteral(-0.002))
}
they should "parse integral numbers" in {
implicit val parserToTest = int
parsing("0") should equal (NumberLiteral(0))
parsing("1") should equal (NumberLiteral(1))
parsing("-1") should equal (NumberLiteral(-1))
parsing("20") should equal (NumberLiteral(20))
parsing("-20") should equal (NumberLiteral(-20))
}
Interestingly, I found a bug using this test, as I did not adhere to my own advice and wrote the tests for the parsers from the last article after I published the article. The original double parser looked like this:
def double :Parser[Expression] = (decimalNumber | floatingPointNumber) ^^ {
s => new NumberLiteral(s.toDouble)}
This however failed the following test:
parsing("2.0e3") should equal (NumberLiteral(2000.0))
Why? Well, the decimalNumber
matched the 2.0
and then happily returned a Success
and the dangling e3
did not get parsed causing the phrase
parser to fail. (This is a good example why I prefer to wrap the parser under test in phrase
). The solution was to move the floatingPointNumber
parser to the front so an optional exponent will be consumed, too. In fact, it turned out that the decimalNumber
parser was totally useless here, so the updated parser looks like this:
def double :Parser[Expression] = floatingPointNumber ^^ {
s => new NumberLiteral(s.toDouble)}
Even better, the floatingPointNumber
accepts integral literals as well, and as we parse everything into a Double
anyway, we can remove that parser as well:
def literal :Parser[Expression] = boolean | string | double
Now we add some tests for string literals and variables and we are almost done:
they should "parse string literals" in {
implicit val parserToTest = string
parsing("\\"test\\"") should equal (StringLiteral("\\"test\\""))
parsing("\\"\\"") should equal (StringLiteral("\\"\\""))//empty string
assertFail("\\"test")//string literal not closed
assertFail("test")//no quotation marks
assertFail("\\"te\\"st\\"")//unescaped quotation mark
//TODO: add interesting cases once we have proper escape handling
}
they should "parse variable names" in {
implicit val parserToTest = variable
parsing ("foo") should equal (Variable("foo"))
parsing ("_foo") should equal (Variable("_foo"))
parsing ("foo_bar") should equal (Variable("foo_bar"))
assertFail("foo-bar")
assertFail("+foo")
assertFail("foo+")
assertFail("")
}
Next time
Now that we know how to properly test our work we can move on to adding operators to our expression parser, so we can start using it for something useful. As far as testing is concerned, this was not all we can do to assure the quality of our parsers. In a later article we will use fixpoint tests to take the testing of our parsers one step further. Also we need to make sure that not only our individual parsers parse what they should but the combined parsers work as well. We will refactor the tests presented here in a later part of the series so we can reuse them for the combined expression parser without introducing redundant test cases.