Transformation

Parslet parsers output deep nested hashes. Those are nice for printing, but hard to work with. The structure of the nested hashes is determined by the grammar and can thus vary largely. Testing for the presence of individual keys would produce code that is hard to read and maintain.

This is why parslet also comes with a hash transformation engine. To construct such a transform, you have to derive from Parslet::Transform:


  class MyTransform < Parslet::Transform
    rule('a') { 'b' }
  end
  MyTransform.new.apply('a') # => "b"

This is a transformation that replaces all ’a’s with ’b’s. A transformation rule has two parts: A pattern (here: 'a') and an action block ({ 'b' }).

The engine will go through the input and traverse the tree in depth-first post-order fashion. This means that for a given tree node, it will first visit the children and only then look at the node itself. While traversing, all rules are tested in the order in which they are defined. If a rule matches, the corresponding tree is replaced by whatever the action block returns.

Here’s another way of saying the same thing, perhaps more in line with what you need as a user of Parslet: Parslet::Transform is what allows you to transform the PORO-trees magically into a real abstract syntax tree. The rule definitions are the futuristic nano-machines that act on tree leaves first, eating them away and replacing them with contraptions of your own design. Here’s how that might look like in Ruby:


  tree = {:left => {:int => '1'}, 
          :op   => '+', 
          :right => {:int => '2'}}
        
  class Trans < Parslet::Transform
    rule(:int => simple(:x)) { Integer(x) }
  end
  Trans.new.apply(tree)     # => {:left=>1, :op=>"+", :right=>2}

You can start thinking about the leaves first, transforming those :int => '1' into real Ruby integers. This incremental (test driven!) approach will prevent your intermediary tree from turning into grey goo from too many nano-machines. Rules should in general be simple and transform a small part of the tree into a more useful variant. Turns out that if we were looking for an interpreter, one more rule will give us evaluation:


  tree = {:left => {:int => '1'}, 
          :op   => '+', 
          :right => {:int => '2'}}

  class Trans < Parslet::Transform
    rule(:int => simple(:x)) { Integer(x) }
    rule(:op => '+', :left => simple(:l), :right => simple(:r)) { l + r }
  end
  Trans.new.apply(tree)     # => 3

Cool, isn’t it? To recap: parslet intentionally spits out deep nested hashes, because it also gives you the tool to work with those. Turning the intermediary trees into something useful is really easy.

Working with Captures

What is this simple(symbol) business all about, you might ask. Glad you do.

Simple captures

Transform allows you to specify patterns that have wildcards in them. The wildcards match part of the tree, but at the same time capture it for working on it in your action block. The wildcard


  simple(:x)

will match any object BUT hashes or arrays. While this is obviously useful for capturing strings, you can also capture other ‘simple’ (as opposed to composed) objects of your own creation. simple(:x) would thus match all of these objects:


  "a string"
  123
  Foo.new(:some, :class, :instance)

If you think about what you’ll be doing to your intermediary trees, replacing leaves with more useful objects, simple really makes good sense, since it will stop you from matching entire subtrees.

Matching Repetitions and Sequences

Some patterns (like repetitions and sequences) produce arrays of objects as result. You can use simple(...) to replace all parts of these arrays with your own objects, but you cannot replace the array as a whole. This is the purpose of sequence(symbol):


  sequence(:x)

will match all of these:


  ['a', 'b', 'c']
  ['a', 'a', 'a']
  [Foo.new, Bar.new]

but not


  [{:a => :b}]
  [['a', 'b']]

Like its smaller brother, sequence is very picky about what it consumes and what not. All for the same reasons.

Matching entire subtrees

So you don’t want to listen and really want that big gun with the foot aiming addon. You’ll be needing subtree(symbol). It always matches. Nuff said.

Matching context

A match always binds in a context. The context consists of all bindings that were previously made. If you reuse the same symbol for two consecutive matches within the same pattern, the engine will assume that you want these two matched objects to be equal (under ==). This allows to specify constraints on your matches that would need code to express otherwise:


  # The following code is an excerpt from example/simple_xml.rb in the distro
  t.rule(
    open: {name: simple(:tag)}, 
    close: {name: simple(:tag)}, 
    inner: simple(:t)
  ) { 'verified' }

This replaces matching open and close tags with the word ‘verified’, consuming them from the tree and allowing the same rule to match higher up. A valid XML tree will leave only the word ‘verified’ behind, while the parser will stop at the problem nodes in invalid trees.

Transformation rules

In this chapter, we’ll look more closely at transformation rules and the different ways they can be laid out in your code.

Usage Patterns

The way the transformation engine is constructed, there is not one, but three ways to use it. Since at least one of those is inconvenient for you, the user, I am going to show only the remaining two, Variant 1 that produces an instance of the transform for direct use:


  # Variant 1
  transform = Parslet::Transform.new do
    rule(...) { ... } 
    rule(...) { ... } 
    rule(...) { ... } 
  end
  transform.apply(tree)

and Variant 2 that allows constructing the transformation as a class:


  # Variant 2
  class MyTransform < Parslet::Transform
    rule(...) { ... } 
    rule(...) { ... } 
    rule(...) { ... } 
  end
  MyTransform.new.apply(tree)

I guess both have their sweet spot.

Action blocks: Two flavors

As you might have noticed by now, parslet provides choice as well as nice parsers. To recap: Rules have a left side called pattern and a right side called action block:


  rule(PATTERN) {ACTION_BLOCK}

There are two ways of writing action blocks, and the difference might be fundamental to know to you one day. If written like this:


  rule(:foo => simple(:x)) { puts x }

the block will be able to access x as a local variable. This is very convenient and shortens the action code, often to the point of being very expressive.

But there is a big downside to this way of writing things: The action block must be executed in the context of some magic instance that has x as a local method (aka accessor). You can only have one self at any one time; variable access to the binding of the block isn’t possible inside this kind of action blocks:


  y = 12
  rule(:foo => simple(:x)) { Integer(x) + y }

This will (depending on the context) throw a NameError or a NoMethodError.

But this can be fixed by using the other, less elegant style for action blocks:


  y = 12
  rule(:foo => simple(:x)) { |dictionary| Integer(dictionary[:x]) + y }

In this second flavor, the block gets executed in the context of definition, whatever that was. This means that it can capture and access local variables just fine. Access to the bindings (called dictionary here) is more clumsy, but hey, you can’t have your cake and eat it too, I guess. Even though that is a pity.

A word on patterns

Given the PORO hash


  { 
    :dog => 'terrier', 
    :cat => 'suit' }

one might assume that the following rule matches :dog and replaces it by 'foo':


  rule(:dog => 'terrier') { 'foo' }

This is frankly impossible. How would 'foo' live besides :cat => 'suit' inside the hash? It cannot. This is why hashes are either matched completely, cats n’ all, or not at all.

Transformations are there for one thing: Getting out of the hash/array/slice mess parslet creates (on purpose) into the realm of your own beautifully crafted AST classes. Such AST nodes will generally correspond 1:1 to hashes inside your intermediary tree.

If transformations get you into a mess, remember this simple truth: They have been designed for the above purpose. Abusing them is fun (and almost all the examples in the project do so) but the mess you get when you do is all yours.

If you are really desperate, try to look at the example in Get Started or at the parser in the sample project wt. Imitating them would be a good first step. And if all else fails, we’re there for you, see the ‘Contact’ section in Contribute.

Summary

This concludes this (three part) introduction to parslet and leaves you with a good knowledge of most tricky parts. If you are missing some detail, maybe you can find it in the texts referenced here? There is also an entire page on the tricks useful in practice here: Tricks.

If not, please tell us about it. We’ll include it in this documentation in no time.