Creating immutable tree data structures in Ruby
What are the advantages of immutable data structures outside functional programming? We explore the benefits of immutability when designing a search query API in Ruby.
A data structure is immutable if it cannot be changed after it is created. In a recent Ruby application we decided to use immutability to our advantage. This article explains how. We will look at an interesting property of the API of Active Record and see how we can easily implement a similar API with an immutable tree structure.
Abstract syntax trees
One of our customers has a web application that is built with elasticsearch, an open source search engine. We developed an interface to query the search index: a simple API that makes it easy to build complex search queries. For example, if we want to get a list of the first five products sorted by name, this is how we might retrieve them:
The search queries are ultimately represented in JSON, but before they're sent to the search index they are internally represented as an abstract syntax tree. The query above might have a syntax tree roughly similar to the one below:
Before explaining more about our implementation, let's briefly discuss the interfaces of Active Record and the library that powers it: Arel. The reason for this detour is that they are similar to the API we created for our search index. It will also illustrate an important advantage of immutable data structures: persistence.
Let's look at the API of Arel first. It is used internally by Active Record and can be accessed by calling the
arel_table method on any Active Record model. If we call
to_sql on the objects returned by Arel, we can inspect the SQL query that is generated.
In the examples below, notice how calling a query method changes the original object. In fact, all methods simply return the same object:
reports = Report.arel_table all_reports = reports.project(reports[Arel.star]) all_reports.to_sql #=> SELECT "reports".* FROM "reports" all_reports.where(reports[:published].eq(true)).to_sql #=> SELECT "reports".* FROM "reports" WHERE "reports"."published" = 't' all_reports.take(5).to_sql #=> SELECT "reports".* FROM "reports" WHERE "reports"."published" = 't' LIMIT 5 all_reports.to_sql #=> SELECT "reports".* FROM "reports" WHERE "reports"."published" = 't' LIMIT 5
Now let's compare this behaviour with Active Record. Its query methods return objects that represent the results of a database query. We can also inspect them by calling
As you can see, calling a query method on an object does not change the previous objects:
all_reports = Report.scoped all_reports.to_sql #=> SELECT "reports".* FROM "reports" all_reports.where(published: true).to_sql #=> SELECT "reports".* FROM "reports" WHERE "reports"."published" = 't' all_reports.limit(5).to_sql #=> SELECT "reports".* FROM "reports" LIMIT 5 all_reports.to_sql #=> SELECT "reports".* FROM "reports"
Data structures that behave like Active Record's interface are called persistent data structures. Each operation exposes a new version of the data structure while the original is unmodified. This is very helpful if multiple queries are derived from the same basic query.
Active Record goes through great lengths to achieve a persistent interface. A lot of code is needed to keep track of the input to each query method and deliver it to Arel to build the SQL queries.
When designing the API for our search index, we discussed the various ways to build a syntax tree while having an interface that is persistent. Some options are:
- Store the input to query methods and return a new value object that encapsulates all parameters. Delay building the syntax tree until the very last.
- Build the syntax tree immediately but clone the entire tree every time a query method causes a change in the tree. Return the new tree. We can't reuse parts of the old tree because they may be modified accidentally.
- Build the syntax tree immediately but use an immutable tree data structure. Immutable structures have a persistent interface by definition. We can reuse parts of the old tree without any risk.
Active Record uses the first approach. It works well, but is certainly not the easiest. The second approach is simpler but is needlessly expensive, especially with complex queries.
The third approach is quite efficient and simple to implement. It has the additional benefit of guaranteed thread-safety. The remainder of this article further explores how to use immutable trees in Ruby to represent an abstract syntax tree.
Immutable tree structures
Returning to our search index API, how do we use immutable trees to our advantage? Suppose we construct two queries, the second one being a variation on the first one:
query1 = Document.where(tags: "product").sort("name") query2 = query1.take(5)
The calls to the query methods are internally converted to an abstract syntax tree. The first query contains a reference to a where node and to a sort node. The second query consists of a new root node but shares the where and sort nodes with the first query.
Because each node is immutable there is no risk in sharing the subtrees. We share them because it's both easier to implement and more efficient.
Immutable objects in Ruby
Our first step is to make sure we can make objects immutable after their creation. For this purpose we're going to override the
new class method. We call
freeze on all newly created nodes as well as their instance variables. Our changes are isolated in a module.
module Immutable def new(*) # Create a new object, then freeze it along with its instance variables. object = super object.instance_variables.each do |var| object.instance_variable_get(var).freeze end object.freeze end end
As long as the instance variables are primitives such as strings, it is impossible to change the state of a new node. If instance variables can be arrays, hashes or other complex types, we should recursively freeze them to ensure immutability. Because an abstract syntax tree typically only has primitive values or arrays of other (frozen) nodes we don't necessarily have to deal with it.
class ExampleNode extend Immutable attr_reader :value def initialize(value) @value = value end end node = ExampleNode.new(str) node.value.replace("update") # RuntimeError: can't modify frozen String node.instance_variable_set(:@value, "update") # RuntimeError: can't modify frozen ExampleNode
Implementation of an immutable syntax tree
With basic immutability implemented we can create classes for some of the nodes we need in our syntax trees. Except for the initialiser, none of the instance methods we define can modify the object state. We must return a replacement node instead.
class Node extend Immutable end class QueryNode < Node attr_reader :where_node, :sort_node, :limit_node def initialize(where_node = WhereNode.new, sort_node = nil, limit_node = nil) @where_node, @sort_node, @limit_node = where_node, sort_node, limit_node end def where(predicates) QueryNode.new(where_node.append(predicates), sort_node, limit_node) end def sort(field) QueryNode.new(where_node, SortNode.new(field), limit_node) end def take(limit) QueryNode.new(where_node, sort_node, LimitNode.new(limit)) end end class WhereNode < Node attr_reader :condition_nodes def initialize(condition_nodes = ) @condition_nodes = condition_nodes end def append(predicates) appended_condition_nodes = predicates.map do |field, value| EqualityConditionNode.new(field, value) end WhereNode.new(condition_nodes + appended_condition_nodes) end end class LimitNode < Node attr_reader :limit def initialize(limit) @limit = limit end end # And so on... SortNode and EqualityConditionNode omitted.
We want to make the query methods available as class methods on a
Document model. All we have to do is delegate each method to an empty query node.
require "forwardable" class Document class << self extend Forwardable delegate [:where, :sort, :take] => :all def all QueryNode.new end end end
This is just a very basic example, but we've already managed to implement a working API:
query1 = Document.where(tags: "product").sort("name") query2 = query1.take(5) query2.limit_node #=> #<LimitNode @limit=5> query1.limit_node #=> nil
The original query remained unaffected! Because we used an immutable tree structure the interface of the syntax tree is guaranteed to be persistent.
Immutability has practical applications outside functional programming. We explored how to implement immutable abstract syntax trees in a non-functional language like Ruby.
The resulting implementation has clear advantages over a traditional mutable tree structure. By definition it is thread-safe and it has a persistent interface. This means we can use multiple versions of the same tree without worrying about unintended changes or data corruption.
Thread-safety and persistent interfaces usually require some careful thought when implemented in a mutable data structure. However, they emerged automatically from our decision to favour strictly enforced immutability. An immutable implementation will therefore most likely contain fewer errors.