Reading Ruby Code: ROM - DSLs
After a long hiatus, this is the fourth part of my on-going series on code reading, the beginning can be found here
The ability to easily define a domain specific language (DSL) is one of Ruby’s powerful features. This gives us the readable syntax of RSpec test definitions:
RSpec.describe Order do
it "sums the prices of its line items" do
#...
end
end
In terms of code reading, this introduces a new wrinkle. When a well-formed DSL is used,
the defined language begins to blend in with the language’s own
keywords. While the above is definitely ruby code, it lacks the usual
module
, class
, and def
structure we find in most other .rb
files. In fact,
my editor even has a plugin that provides a whole new syntax mode for rspec files
on top of the ruby mode. In exchange for this shift in structure, the writer and
the reader get a “language” that maps more tightly with the concepts it is
modeling. When the writer of the code uses DSLs correctly, they reveal
intent and convey meaning more quickly. Used incorrectly, or at least non-judiciously,
they impose a burden on the reader to remember and hold onto concepts that may
be much clearer in the language’s standard vernacular.
ROM uses domain specific languages in a few places, a notable example is when defining a relation’s
schema. ROM provides methods such as an attribute
method for specifying the
attributes of a schema (naturally):
class Users < ROM::Relation[:http]
schema do
attribute :id, Types::Int
attribute :name, Types::String
attribute :age, Types::Int
end
end
This doesn’t look all that different from defining attr_accessor
s on a plain old ruby object (PORO):
class User
attr_accessor :id, :name, :age
end
In fact, I’d suspect that the two classes above would have similar APIs (at least in terms of the read/writers for those methods). What the DSL version layers on top of that is a visual and syntactical similarity to what it is trying to model. Compare the above to the PostgreSQL definition of a user’s table:
CREATE TABLE "users"
(
id integer,
name text,
age integer
)
The syntax of the ROM class above maps well to the syntax that the DB uses to describe its structured data. Also remember that ROM is trying to work with almost any type of data. Therefore, the syntactical similarity speaks more to the fact that these two languages are hitting on the same commonality, rather than ROM being influenced by PG database syntax. We can draw this point out further if we look at the definition of a struct in Ecto (a database wrapper in the elixir language): This is drawn from the ecto [docs][ecto docs], and it is just a happy coincidence that they were using a User as exemplar.
This is drawn from the ecto [docs][ecto docs], and it is just a happy coincidence that they were using a User as exemplar.defmodule User do
use Ecto.Schema
schema "users" do
field :name, :string
field :age, :integer, default: 0
has_many :posts, Post
end
end
While all slightly different, the above examples are speaking some type of lingua franca for modeling structured data. What this means for the reader is that if they are familiar with one of the above models, they don’t need to update their mental model too much to understand what is going on in another. An interesting counter point to this “common” language idea might be that this encourages “group think”. Perhaps pushing people towards homogeneous thinking and stifling innovation.
The DSL version also conveys two additional pieces of information over the PORO
example. First, the types of the each attribute. We can expect that name
should return a string and name=
likely accepts a string. Secondly, schema
is some type of special feature of a User
. Since we are using a special syntax
(and class method), the writer has elevated its importance for this class. As a
reader we should home in on this as an area to focus on. User
s are not merely
some PORO data object, they also have this schema
property that conveys
additional functionality. If we don’t understand what a schema
is for a class,
it should be prioritized for further investigation.
Lets talk about a naive implementation of the schema
and attribute
methods
on the Relation
class
class ROM::Relation
def self.schema(&block)
@schema ||= {}
yield
end
def self.attribute(attribute, type)
@schema[attribute] = type
end
end
This would allow meet the basic requirement of storing the schema. However, it
suffers from the fact that attribute
is accessible outside of the schema
block, and even outside of the Relation class. This is problematic because the
relation may not need/want to expose this concept. Lets look at how ROM’s
implementation of the Relation
class adds the methods from the
ClassInterface
module via extend
:
lib/rom/relation.rb
This code sample is by rom-rb, you can view the full file here.
30 # @api public
31 class Relation
32 extend ClassInterface
From that module we get the schema
method below:
Why place the class methods of Relation in a separate class? I suspect the
reasoning is to partition the Relation class from its direct API (instance
methods) from the internal workings of the class. In that way it is clear how
other objects interact with the Relation instance, while also clear how to
modify aspects of the class.
lib/rom/relation/class_interface.rb
This code sample is by rom-rb, you can view the full file here.
153 # @api public
154 def schema(dataset = nil, infer: false, &block)
155 if defined?(@schema)
156 @schema
157 elsif block || infer
158 self.dataset(dataset) if dataset
159 self.register_as(self.dataset) unless register_as
160 161 name = Name[register_as, self.dataset]
162 inferrer = infer ? schema_inferrer : nil
163 dsl = schema_dsl.new(name, inferrer, &block)
164 165 @schema = dsl.call
166 end
167 end
The schema method returns the @schema
instance variable if it is defined.
Alternatively, it uses an instance of Schema::DSL, to create a new schema
instance. This is memoization in a more verbose form from the
usual ruby idiom of:
def result
@result ||= complicated_function
end
Additionally it has the added advantage that a nil
value will keep the
memoized function from running.
[1] pry(main)> defined?(@test)
=> nil
[2] pry(main)> @test = nil
=> nil
[3] pry(main)> defined?(@test)
=> "instance-variable"
[4] pry(main)> @test ||= -> { puts "hello" }.call
hello
=> nil
[5] pry(main)> defined?(@test) || -> { puts "hello" }.call
=> "instance-variable"
When a block is passed into the method, the schema method initializes a DSL
instance with the block and other items to build a schema with the call
method.
lib/rom/schema/dsl.rb
This code sample is by rom-rb, you can view the full file here.
18 # @api private
19 def initialize(name, inferrer, &block)
20 @name = name
21 @inferrer = inferrer
22 @attributes = nil
23 24 if block
25 instance_exec(&block)
26 elsif inferrer.nil?
27 raise ArgumentError,
28 'You must pass a block to define a schema or set an inferrer for automatic inferring'
29 end
30 end
The instance_exec
method executes the block from schema within the context of
the newly created DSL instance. This allows access to the instance methods
such as attribute
:
lib/rom/schema/dsl.rb
This code sample is by rom-rb, you can view the full file here.
36 # @api public
37 def attribute(name, type)
38 @attributes ||= {}
39 @attributes[name] = type.meta(name: name)
40 end
Once the block has all been run, the calls to the attributes have populated the
@attributes
hash. So, the data needed to populate a Schema
instance is
present within this DSL instance:
lib/rom/schema.rb
This code sample is by rom-rb, you can view the full file here.
39 # @api private
40 def initialize(name, attributes, inferrer: nil, associations: EMPTY_ASSOCIATION_SET)
41 @name = name
42 @attributes = attributes
43 @associations = associations
44 @inferrer = inferrer
45 end
This prepared data can be used to populate an instance with the call
method that we saw
earlier in the Relation’s schema
method:
lib/rom/schema/dsl.rb
This code sample is by rom-rb, you can view the full file here.
52 # @api private
53 def call
54 Schema.new(name, attributes, inferrer: inferrer && inferrer.new(self))
To review, the class methods of Relation
are partitioned into a separate module,
which defines a schema method which instantiates a Schema::DSL object with the
block passed into it. This passed block is executed within the context of that
instance, allowing the class method caller to define the attributes that
eventually are used to build a Schema
object. Finally, the call method on schema
DSL creates the schema from these attributes.
What are the advantages of this approach to designing a DSL?
- The DSL for schema is self-contained. If a future developer wanted to add a new schema method, they would know exactly where to go
- Executing the schema method block code within an instance is a shrewd move because it makes the code a bit simpler in that you don’t have to deal with what is a class method or too much variable passing
While the DSL pattern outlined above is repeated throughout the ROM codebase, in
this particular instance its worth noting that this approach serves another purpose.
The Schema::DSL
object is being used as a builder for preparing a
Schema object. Rather than asking the user of the Relation class to explicitly
build the object themselves, or learn a complicated set of configuration options
for the Schema class, the DSL provides an intuitive interface while hiding the
details of how to actually build a Schema object. This DSL-over-configuration
is an interesting pattern, and worth remembering in your own code writing.
DSLs are an important tool as a reader and writer of code. We reviewed ROM’s contained method of building a DSL. This showed how it could be used to great effect for modeling the underlying domain and providing an alternative configuration. As a reader of code, we should pay attention to where DSLs are used because the elevate the importance of that set of code. The above ROM sample is describing the data it is working with (a key part of the library itself), the RSpec example is creating a test (the goal of a test suite). Additionally, when it comes to DSLs we should focus on how the new “language” articulates the underlying concepts. If the goal of a DSL is to map more closely to the domain, what intent and concepts are trying to be conveyed with the DSL? I believe this question applies equally for both the reader and writer.
Thanks for reading, I look forward to more ruby code reading with you in the future.