Michael Cordell's Blog

Reading Ruby Code: ROM - DSLs

11 Jun 2017

After a long hiatus, this is the fourth part of my on-going series on code reading, the beginning can be found here

The ability to easily define a domain-specific language (DSL) is one of Ruby’s powerful features. This gives us the readable syntax of RSpec test definitions:

RSpec.describe Order do
  it "sums the prices of its line items" do
    #...
  end
end

In terms of code reading, this introduces a new wrinkle. When a well-formed DSL is used, the defined language begins to blend in with the language’s own keywords. While the above is definitely ruby code, it lacks the usual module, class, and def structure we find in most other .rb files. In fact, my editor even has a plugin that provides a whole new syntax mode for rspec files on top of the ruby mode. In exchange for this shift in structure, the writer and the reader get a “language” that maps more tightly with the concepts it is modeling. When the writer of the code uses DSLs correctly, they reveal intent and convey meaning more quickly. Used incorrectly, or at least non-judiciously, they impose a burden on the reader to remember and hold onto concepts that may be much clearer in the language’s standard vernacular.

ROM uses domain-specific languages in a few places, a notable example is when defining a relation’s schema. ROM provides methods such as an attribute method for specifying the attributes of a schema (naturally):

class Users < ROM::Relation[:http]
  schema do
    attribute :id, Types::Int
    attribute :name, Types::String
    attribute :age, Types::Int
  end
end

This doesn’t look all that different from defining attr_accessors on a plain old ruby object (PORO):

class User
  attr_accessor :id, :name, :age
end

In fact, I’d suspect that the two classes above would have similar APIs (at least in terms of the read/writers for those methods). What the DSL version layers on top of that is a visual and syntactical similarity to what it is trying to model. Compare the above to the PostgreSQL definition of a user’s table:

CREATE TABLE "users"
(
  id integer,
  name text,
  age integer
)

The syntax of the ROM class above maps well to the syntax that the DB uses to describe its structured data. Also remember that ROM is trying to work with almost any type of data. Therefore, the syntactical similarity speaks more to the fact that these two languages are hitting on the same commonality, rather than ROM being influenced by PG database syntax. We can draw this point out further if we look at the definition of a struct in Ecto (a database wrapper in the elixir language): This is drawn from the ecto docs, and it is just a happy coincidence that they were using a User as exemplar.

defmodule User do
  use Ecto.Schema

  schema "users" do
    field :name, :string
    field :age, :integer, default: 0
    has_many :posts, Post
  end
end

While all slightly different, the above examples are speaking some type of lingua franca for modeling structured data. What this means for the reader is that if they are familiar with one of the above models, they don’t need to update their mental model too much to understand what is going on in another. An interesting counter point to this “common” language idea might be that this encourages “group think”. Perhaps pushing people towards homogeneous thinking and stifling innovation.

The DSL version also conveys two additional pieces of information over the PORO example. First, the types of the each attribute. We can expect that name should return a string and name= likely accepts a string. Secondly, schema is some type of special feature of a User. Since we are using a special syntax (and class method), the writer has elevated its importance for this class. As a reader we should home in on this as an area to focus on. Users are not merely some PORO data object, they also have this schema property that conveys additional functionality. If we don’t understand what a schema is for a class, it should be prioritized for further investigation.

Lets talk about a naive implementation of the schema and attribute methods on the Relation class

class ROM::Relation
  def self.schema(&block)
    @schema ||= {}
    yield
  end

  def self.attribute(attribute, type)
    @schema[attribute] = type
  end
end

This would allow meet the basic requirement of storing the schema. However, it suffers from the fact that attribute is accessible outside of the schema block, and even outside of the Relation class. This is problematic because the relation may not need/want to expose this concept. Lets look at how ROM’s implementation of the Relation class adds the methods from the ClassInterface module via extend:

This code sample is by rom-rb, you can view the full file here.

class Relation
  extend ClassInterface

From that module we get the schema method below: Why place the class methods of Relation in a separate class? I suspect the reasoning is to partition the Relation class from its direct API (instance methods) from the internal workings of the class. In that way it is clear how other objects interact with the Relation instance, while also clear how to modify aspects of the class.

This code sample is by rom-rb, you can view the full file here.

def schema(dataset = nil, infer: false, &block)
  if defined?(@schema)
    @schema
  elsif block || infer
    self.dataset(dataset) if dataset
    self.register_as(self.dataset) unless register_as

    name = Name[register_as, self.dataset]
    inferrer = infer ? schema_inferrer : nil
    dsl = schema_dsl.new(name, inferrer, &block)

    @schema = dsl.call
  end
end

The schema method returns the @schema instance variable if it is defined. Alternatively, it uses an instance of Schema::DSL, to create a new schema instance. This is memoization in a more verbose form from the usual ruby idiom of:

def result
  @result ||= complicated_function
end

Additionally it has the added advantage that a nil value will keep the memoized function from running.

[1] pry(main)> defined?(@test)
=> nil
[2] pry(main)> @test = nil
=> nil
[3] pry(main)> defined?(@test)
=> "instance-variable"
[4] pry(main)> @test ||= -> { puts "hello" }.call
hello
=> nil
[5] pry(main)> defined?(@test) || -> { puts "hello" }.call
=> "instance-variable"

When a block is passed into the method, the schema method initializes a DSL instance with the block and other items to build a schema with the call method.

This code sample is by rom-rb, you can view the full file here.

def initialize(name, inferrer, &block)
  @name = name
  @inferrer = inferrer
  @attributes = nil

  if block
    instance_exec(&block)
  elsif inferrer.nil?
    raise ArgumentError,
      'You must pass a block to define a schema or set an inferrer for automatic inferring'
  end
end

The instance_exec method executes the block from schema within the context of the newly created DSL instance. This allows access to the instance methods such as attribute:

This code sample is by rom-rb, you can view the full file here.

def attribute(name, type)
  @attributes ||= {}
  @attributes[name] = type.meta(name: name)
end

Once the block has all been run, the calls to the attributes have populated the @attributes hash. So, the data needed to populate a Schema instance is present within this DSL instance:

This code sample is by rom-rb, you can view the full file here.

def initialize(name, attributes, inferrer: nil, associations: EMPTY_ASSOCIATION_SET)
  @name = name
  @attributes = attributes
  @associations = associations
  @inferrer = inferrer
end

This prepared data can be used to populate an instance with the call method that we saw earlier in the Relation’s schema method:

This code sample is by rom-rb, you can view the full file here.

def call
  Schema.new(name, attributes, inferrer: inferrer && inferrer.new(self))
end

To review, the class methods of Relation are partitioned into a separate module, which defines a schema method which instantiates a Schema::DSL object with the block passed into it. This passed block is executed within the context of that instance, allowing the class method caller to define the attributes that eventually are used to build a Schema object. Finally, the call method on schema DSL creates the schema from these attributes.

What are the advantages of this approach to designing a DSL?

While the DSL pattern outlined above is repeated throughout the ROM codebase, in this particular instance its worth noting that this approach serves another purpose. The Schema::DSL object is being used as a builder for preparing a Schema object. Rather than asking the user of the Relation class to explicitly build the object themselves, or learn a complicated set of configuration options for the Schema class, the DSL provides an intuitive interface while hiding the details of how to actually build a Schema object. This DSL-over-configuration is an interesting pattern, and worth remembering in your own code writing.

DSLs are an important tool as a reader and writer of code. We reviewed ROM’s contained method of building a DSL. This showed how it could be used to great effect for modeling the underlying domain and providing an alternative configuration. As a reader of code, we should pay attention to where DSLs are used because the elevate the importance of that set of code. The above ROM sample is describing the data it is working with (a key part of the library itself), the RSpec example is creating a test (the goal of a test suite). Additionally, when it comes to DSLs we should focus on how the new “language” articulates the underlying concepts. If the goal of a DSL is to map more closely to the domain, what intent and concepts are trying to be conveyed with the DSL? I believe this question applies equally for both the reader and writer.

Thanks for reading, I look forward to more ruby code reading with you in the future.

comments powered by Disqus