An Introduction To Scala Parser Combinators - Part 3: Writing unit tests for parsers

Jan 29, 2011 16:22 · 1067 words · 6 minutes read Scala Parser Combinators

Today I will describe how to test the basic literal and variable parsers we wrote in the last part of this series. Parser development is a great candidate for TDD, as it is pretty clear how to write the tests. Writing your tests in parallel when you develop your parser will speed up your work significantly.Should you want to tune the performance of your parser later, good test coverage is essential so you don’t break it during refactoring. In my examples here I use scalatest, but the principles outlined here work for any testing framework.

How to call the parser in tests

Our parsers are always members of a trait, so we will not create an instance of a parser object for testing but instead mix the parser trait right into our test class:

class ExpressionParsersTest extends ExpressionParsers with FlatSpec with ShouldMatchers {
}

We will write a little utility function to execute a parser so the tests stay readable:

private def parsing[T](s:String)(implicit p:Parser[T]):T = {
    //wrap the parser in the phrase parse to make sure all input is consumed
    val phraseParser = phrase(p)
    //we need to wrap the string in a reader so our parser can digest it
    val input = new CharSequenceReader(s) 
    phraseParser(input) match {
        case Success(t,_)     => t
        case NoSuccess(msg,_) => throw new IllegalArgumentException(
                                        "Could not parse '" + s + "': " + msg)
    }
}

I chose the name parsing as it mixes well with the should matcher DSL. There are a few peculiarities about that method:

  • The parser we pass in to test is wrapped in another parser called phrase. This parser makes sure that all input is consumed. Otherwise testing the boolean parser would seem to succeed when parsing "truefoo", as it would consume "true" and return a success with the rest of the input. Enforcing the parsing of all input in tests is a matter of taste, I found it is the behavior I wanted in my tests.

  • The parser passed to the function for testing is marked implicit. While I think implicit should be used rarely and with great care, we will see that it is a fitting use here and will make the tests better readable.

  • We just turn any non-successful parse result into an exception which will fail our test.

Often one needs to assert that a parser does not accept a certain input. To reduce clutter, we add another helper for that:

private def assertFail[T](input:String)(implicit p:Parser[T]) {
    evaluating(parsing(input)(p)) should produce[IllegalArgumentException]
}

The tests for literals & variables

Now let’s test our literals. With our new helpers, this is fast and easy, let us try booleans first:

"The ExpressionParsers" should "parse boolean literals" in {
    //just declare the parser to test once and mark it implicit
    //that way our test functions will use it automagically
    implicit val parserToTest = boolean
    parsing("true")  should equal(BooleanLiteral(true))
    parsing("false") should equal(BooleanLiteral(false))
    assertFail("True")
    assertFail("False")
    assertFail("TRUE")
    assertFail("FALSE")
    assertFail("truefoo")
}

We see here that it is important that the result of the parser provides proper equals and hashCode implementations, so we can compare the results with our expeceted values. You should try and use case classes for your parse results where possible. Now let’s make sure our numbers work:

//scalatest does not provide a "they" and 
//as it is ExpressionParserS, not ExpressionParser
//I like to make the test specs look nicer...
val they = it

they should "parse floating point numbers" in {
    implicit val parserToTest = double
    parsing("0.0")     should equal (NumberLiteral(0.0))
    parsing("1.0")     should equal (NumberLiteral(1.0))
    parsing("-1.0")    should equal (NumberLiteral(-1.0))
    parsing("0.2")     should equal (NumberLiteral(0.2))
    parsing("-0.2")    should equal (NumberLiteral(-0.2))
    parsing(".2")      should equal (NumberLiteral(.2))
    parsing("-.2")     should equal (NumberLiteral(-.2))
    parsing("2.0e3")   should equal (NumberLiteral(2000.0))
    parsing("2.0E3")   should equal (NumberLiteral(2000.0))
    parsing("-2.0e3")  should equal (NumberLiteral(-2000.0))
    parsing("-2.0E3")  should equal (NumberLiteral(-2000.0))
    parsing("2.0e-3")  should equal (NumberLiteral(0.002))
    parsing("2.0E-3")  should equal (NumberLiteral(0.002))
    parsing("-2.0e-3") should equal (NumberLiteral(-0.002))
    parsing("-2.0E-3") should equal (NumberLiteral(-0.002))
}

they should "parse integral numbers" in {
    implicit val parserToTest = int
    parsing("0")    should equal (NumberLiteral(0))
    parsing("1")    should equal (NumberLiteral(1))
    parsing("-1")   should equal (NumberLiteral(-1))
    parsing("20")   should equal (NumberLiteral(20))
    parsing("-20")  should equal (NumberLiteral(-20))
}

Interestingly, I found a bug using this test, as I did not adhere to my own advice and wrote the tests for the parsers from the last article after I published the article. The original double parser looked like this:

def double  :Parser[Expression] = (decimalNumber | floatingPointNumber) ^^ {
    s => new NumberLiteral(s.toDouble)}

This however failed the following test:

parsing("2.0e3")   should equal (NumberLiteral(2000.0))

Why? Well, the decimalNumber matched the 2.0 and then happily returned a Success and the dangling e3 did not get parsed causing the phrase parser to fail. (This is a good example why I prefer to wrap the parser under test in phrase). The solution was to move the floatingPointNumber parser to the front so an optional exponent will be consumed, too. In fact, it turned out that the decimalNumber parser was totally useless here, so the updated parser looks like this:

def double  :Parser[Expression] = floatingPointNumber ^^ {
    s => new NumberLiteral(s.toDouble)}

Even better, the floatingPointNumber accepts integral literals as well, and as we parse everything into a Double anyway, we can remove that parser as well:

def literal :Parser[Expression] = boolean | string | double

Now we add some tests for string literals and variables and we are almost done:

they should "parse string literals" in {
    implicit val parserToTest = string
    parsing("\\"test\\"") should equal (StringLiteral("\\"test\\""))
    parsing("\\"\\"") should equal (StringLiteral("\\"\\""))//empty string

    assertFail("\\"test")//string literal not closed
    assertFail("test")//no quotation marks
    assertFail("\\"te\\"st\\"")//unescaped quotation mark
    //TODO: add interesting cases once we have proper escape handling
}

they should "parse variable names" in {
    implicit val parserToTest = variable
    parsing ("foo") should equal (Variable("foo"))
    parsing ("_foo") should equal (Variable("_foo"))
    parsing ("foo_bar") should equal (Variable("foo_bar"))

    assertFail("foo-bar")
    assertFail("+foo")
    assertFail("foo+")
    assertFail("")
}

Next time

Now that we know how to properly test our work we can move on to adding operators to our expression parser, so we can start using it for something useful. As far as testing is concerned, this was not all we can do to assure the quality of our parsers. In a later article we will use fixpoint tests to take the testing of our parsers one step further. Also we need to make sure that not only our individual parsers parse what they should but the combined parsers work as well. We will refactor the tests presented here in a later part of the series so we can reuse them for the combined expression parser without introducing redundant test cases.