Creating immutable tree data structures in Ruby

What are the advantages of immutable data structures outside functional programming? We explore the benefits of immutability when designing a search query API in Ruby.

A data structure is immutable if it cannot be changed after it is created. In a recent Ruby application we decided to use immutability to our advantage. This article explains how. We will look at an interesting property of the API of Active Record and see how we can easily implement a similar API with an immutable tree structure.

Abstract syntax trees

One of our customers has a web application that is built with elasticsearch, an open source search engine. We developed an interface to query the search index: a simple API that makes it easy to build complex search queries. For example, if we want to get a list of the first five products sorted by name, this is how we might retrieve them:

Document.where(tags: "product").sort("name").take(5)

The search queries are ultimately represented in JSON, but before they're sent to the search index they are internally represented as an abstract syntax tree. The query above might have a syntax tree roughly similar to the one below:

Before explaining more about our implementation, let's briefly discuss the interfaces of Active Record and the library that powers it: Arel. The reason for this detour is that they are similar to the API we created for our search index. It will also illustrate an important advantage of immutable data structures: persistence.

Persistent interfaces

Let's look at the API of Arel first. It is used internally by Active Record and can be accessed by calling the arel_table method on any Active Record model. If we call to_sql on the objects returned by Arel, we can inspect the SQL query that is generated.

In the examples below, notice how calling a query method changes the original object. In fact, all methods simply return the same object:

reports = Report.arel_table
all_reports = reports.project(reports[Arel.star])

all_reports.to_sql
#=> SELECT "reports".* FROM "reports"

all_reports.where(reports[:published].eq(true)).to_sql
#=> SELECT "reports".* FROM "reports" WHERE "reports"."published" = 't'

all_reports.take(5).to_sql
#=> SELECT "reports".* FROM "reports" WHERE "reports"."published" = 't' LIMIT 5

all_reports.to_sql
#=> SELECT "reports".* FROM "reports" WHERE "reports"."published" = 't' LIMIT 5

Now let's compare this behaviour with Active Record. Its query methods return objects that represent the results of a database query. We can also inspect them by calling to_sql.

As you can see, calling a query method on an object does not change the previous objects:

all_reports = Report.scoped

all_reports.to_sql
#=> SELECT "reports".* FROM "reports"

all_reports.where(published: true).to_sql
#=> SELECT "reports".* FROM "reports" WHERE "reports"."published" = 't'

all_reports.limit(5).to_sql
#=> SELECT "reports".* FROM "reports" LIMIT 5

all_reports.to_sql
#=> SELECT "reports".* FROM "reports"

Data structures that behave like Active Record's interface are called persistent data structures. Each operation exposes a new version of the data structure while the original is unmodified. This is very helpful if multiple queries are derived from the same basic query.

Active Record goes through great lengths to achieve a persistent interface. A lot of code is needed to keep track of the input to each query method and deliver it to Arel to build the SQL queries.

When designing the API for our search index, we discussed the various ways to build a syntax tree while having an interface that is persistent. Some options are:

  • Store the input to query methods and return a new value object that encapsulates all parameters. Delay building the syntax tree until the very last.
  • Build the syntax tree immediately but clone the entire tree every time a query method causes a change in the tree. Return the new tree. We can't reuse parts of the old tree because they may be modified accidentally.
  • Build the syntax tree immediately but use an immutable tree data structure. Immutable structures have a persistent interface by definition. We can reuse parts of the old tree without any risk.

Active Record uses the first approach. It works well, but is certainly not the easiest. The second approach is simpler but is needlessly expensive, especially with complex queries.

The third approach is quite efficient and simple to implement. It has the additional benefit of guaranteed thread-safety. The remainder of this article further explores how to use immutable trees in Ruby to represent an abstract syntax tree.

Immutable tree structures

Returning to our search index API, how do we use immutable trees to our advantage? Suppose we construct two queries, the second one being a variation on the first one:

query1 = Document.where(tags: "product").sort("name")
query2 = query1.take(5)

The calls to the query methods are internally converted to an abstract syntax tree. The first query contains a reference to a where node and to a sort node. The second query consists of a new root node but shares the where and sort nodes with the first query.

Because each node is immutable there is no risk in sharing the subtrees. We share them because it's both easier to implement and more efficient.

Immutable objects in Ruby

Our first step is to make sure we can make objects immutable after their creation. For this purpose we're going to override the new class method. We call freeze on all newly created nodes as well as their instance variables. Our changes are isolated in a module.

module Immutable
  def new(*)
    # Create a new object, then freeze it along with its instance variables.
    object = super
    object.instance_variables.each do |var|
      object.instance_variable_get(var).freeze
    end
    object.freeze
  end
end

As long as the instance variables are primitives such as strings, it is impossible to change the state of a new node. If instance variables can be arrays, hashes or other complex types, we should recursively freeze them to ensure immutability. Because an abstract syntax tree typically only has primitive values or arrays of other (frozen) nodes we don't necessarily have to deal with it.

class ExampleNode
  extend Immutable

  attr_reader :value

  def initialize(value)
    @value = value
  end
end

node = ExampleNode.new(str)

node.value.replace("update")
# RuntimeError: can't modify frozen String

node.instance_variable_set(:@value, "update")
# RuntimeError: can't modify frozen ExampleNode

Implementation of an immutable syntax tree

With basic immutability implemented we can create classes for some of the nodes we need in our syntax trees. Except for the initialiser, none of the instance methods we define can modify the object state. We must return a replacement node instead.

class Node
  extend Immutable
end

class QueryNode < Node
  attr_reader :where_node, :sort_node, :limit_node

  def initialize(where_node = WhereNode.new, sort_node = nil, limit_node = nil)
    @where_node, @sort_node, @limit_node = where_node, sort_node, limit_node
  end

  def where(predicates)
    QueryNode.new(where_node.append(predicates), sort_node, limit_node)
  end

  def sort(field)
    QueryNode.new(where_node, SortNode.new(field), limit_node)
  end

  def take(limit)
    QueryNode.new(where_node, sort_node, LimitNode.new(limit))
  end
end

class WhereNode < Node
  attr_reader :condition_nodes

  def initialize(condition_nodes = [])
    @condition_nodes = condition_nodes
  end

  def append(predicates)
    appended_condition_nodes = predicates.map do |field, value|
      EqualityConditionNode.new(field, value)
    end
    WhereNode.new(condition_nodes + appended_condition_nodes)
  end
end

class LimitNode < Node
  attr_reader :limit

  def initialize(limit)
    @limit = limit
  end
end

# And so on... SortNode and EqualityConditionNode omitted.

We want to make the query methods available as class methods on a Document model. All we have to do is delegate each method to an empty query node.

require "forwardable"

class Document
  class << self
    extend Forwardable
    delegate [:where, :sort, :take] => :all

    def all
      QueryNode.new
    end
  end
end

This is just a very basic example, but we've already managed to implement a working API:

query1 = Document.where(tags: "product").sort("name")
query2 = query1.take(5)

query2.limit_node
#=> #<LimitNode @limit=5>

query1.limit_node
#=> nil

The original query remained unaffected! Because we used an immutable tree structure the interface of the syntax tree is guaranteed to be persistent.

Summary

Immutability has practical applications outside functional programming. We explored how to implement immutable abstract syntax trees in a non-functional language like Ruby.

The resulting implementation has clear advantages over a traditional mutable tree structure. By definition it is thread-safe and it has a persistent interface. This means we can use multiple versions of the same tree without worrying about unintended changes or data corruption.

Thread-safety and persistent interfaces usually require some careful thought when implemented in a mutable data structure. However, they emerged automatically from our decision to favour strictly enforced immutability. An immutable implementation will therefore most likely contain fewer errors.