Tricks for common situations

Here’s a topic overview:

Matching EOF (End Of File)

Ahh Sir, you’ll be needin what us parsers call epsilon:


  rule(:eof) { any.absent? }

Of course, most of us don’t use this at all, since any parser has EOF as implicit last input.

Matching Strings Case Insensitive

Parslet is fully hackable: You can use code to create parsers easily. Here’s how I would match a string in case insensitive manner:


  def stri(str)
    key_chars = str.split(//)
    key_chars.
      collect! { |char| match["#{char.upcase}#{char.downcase}"] }.
      reduce(:>>)
  end

  # Constructs a parser using a Parser Expression Grammar 
  stri('keyword').parse "kEyWoRd"     # => "kEyWoRd"@0

Testing

Parslet helps you to create parsers that are in turn created out of many small parsers. It is really turtles all the way down. Imagine you have a complex parser:


  class ComplexParser < Parslet::Parser
    root :lots_of_stuff
  
    rule(:lots_of_stuff) { ... }
  
    # and many lines later: 
    rule(:simple_rule) { str('a') }
  end

Also imagine that the parser (as a whole) fails to consume the ‘a’ that simple_rule is talking about.

This kind of problem can very often be fixed by bisecting it into two possible problems. Either:

  1. the lots_of_stuff rule somehow doesn’t place simple_rule in the right context or
  2. the simple_rule simply (hah!) fails to match its input.

I find it very useful in this situation to eliminate 2. from our options:


  require 'rspec'
  require 'parslet/rig/rspec'
  
  class ComplexParser < Parslet::Parser
    rule(:simple_rule) { str('a') }
  end

  RSpec.describe ComplexParser  do
    let(:parser) { ComplexParser.new }
    context "simple_rule" do
      it "should consume 'a'" do
        expect(parser.simple_rule).to parse('a')
      end 
    end
  end
  
  RSpec::Core::Runner.run(['--format', 'documentation'])

Output is:

Example::ComplexParser simple_rule should consume ‘a’

Finished in 0.00094 seconds (files took 0.29367 seconds to load) 1 example, 0 failures

Parslet parsers have one method per rule. These methods return valid parsers for a subset of your grammar.

Error reports

If your grammar fails and you’re aching to know why, here’s a bit of exception handling code that will help you out:


  parser = str('foo')
  begin
    parser.parse('bar')
  rescue Parslet::ParseFailed => error
    puts error.parse_failure_cause.ascii_tree
  end

This should print something akin to:

 
Expected "foo", but got "bar" at line 1 char 1.

These error reports are probably the fastest way to know exactly where you went wrong (or where your input is wrong, which is aequivalent).

And since this is such a common idiom, we provide you with a shortcut: to get the above, just:


require 'parslet/convenience'
parser.parse_with_debug(input)

Reporter engines

Note that there is currently not one, but two error reporting engines! The default engine will report errors in a structure that looks exactly like the grammar structure:


  class P < Parslet::Parser
    root(:body)
    rule(:body) { elements }
    rule(:elements) { (call | element).repeat(2) }
    rule(:element) { str('bar') }
    rule(:call) { str('baz') >> str('()') }
  end
  
  begin
    P.new.parse('barbaz')
  rescue Parslet::ParseFailed => error
    puts error.parse_failure_cause.ascii_tree
  end

Outputs:

 
Expected at least 2 of CALL / ELEMENT at line 1 char 1.
`- Expected one of [CALL, ELEMENT] at line 1 char 4.
   |- Failed to match sequence ('baz' '()') at line 1 char 7.
   |  `- Premature end of input at line 1 char 7.
   `- Expected "bar", but got "baz" at line 1 char 4.

Let’s switch out the ‘grammar structure’ engine (called ‘Tree’) with the ‘deepest error position’ engine:


  class P < Parslet::Parser
    root(:body)
    rule(:body) { elements }
    rule(:elements) { (call | element).repeat(2) }
    rule(:element) { str('bar') }
    rule(:call) { str('baz') >> str('()') }
  end
  
  begin
    P.new.parse('barbaz', reporter: Parslet::ErrorReporter::Deepest.new)
  rescue Parslet::ParseFailed => error
    puts error.parse_failure_cause.ascii_tree
  end

Outputs:

 
Expected at least 2 of CALL / ELEMENT at line 1 char 1.
`- Expected one of [CALL, ELEMENT] at line 1 char 4.
   |- Failed to match sequence ('baz' '()') at line 1 char 7.
   |  `- Premature end of input at line 1 char 7.
   `- Premature end of input at line 1 char 7.

The 'Deepest' position engine will store errors that are the farthest into the input. In some examples, this produces more readable output for the end user.

Line numbers from parser output

A traditional parser would parse and then perform several checking phases, like for example verifying all type constraints are respected in the input. During this checking phase, you will most likely want to report screens full of type errors back to the user (‘cause that’s what types are for, right?). Now where did that ‘int’ come from?

Parslet gives you slices (Parslet::Slice) of input as part of your tree. These are essentially strings with line numbers. Here’s how to print that error message:


  # assume that type == "int"@0 - a piece from your parser output
  line, col = type.line_and_column
  puts "Sorry. Can't have #{type} at #{line}:#{col}!"

Precedence climber

You might want to implement a parser for simple arithmetic infix expressions such as `1 + 2`. The quickest way to do this with parslet is to use the infix expression parser atom:


  infix_expression(
    match('[0-9]').repeat,
    [str('*'), 2],
    [str('+'), 1]) # matches both "1+2*3" and "1*2+3"

Please also see the example and the inline documentation for this feature.