Lambda Island

Datomic is a database based on the same principles that underly the design of Clojure itself. Learn what makes it different, and how to start using from Clojure immediately.

This first part gives an overview of the architecture and data model, and walks you through your first transactions.

Transcript

Datomic is a database system by Cognitect, the same company that oversees the development of Clojure. It is written in Clojure and built on the same principles as Clojure itself, so using it from Clojure is a smooth experience.

Datomic is an “immutable” database. It stores a history of facts. Once a fact is in the database you can’t change it or delete it, but you can store a new fact that supercedes or invalidates the old one.

This way Datomic can easily tell which facts existed at a certain point in time. Such a collection of facts is called a database value. For datomic there’s little difference between the state of the database today, or the state of the database one year ago. They’re simply represented as separate database values.

Datomic’s architecture is a bit different from other database system. In a traditional database clients connect to a database server, which then handles all database related tasks. You can imagine there’s a part in there that takes care of updates, and a part that let’s you read stuff from the database. In the diagram I’ve labeled these as “Transactor” and “Query Engine”.

Datomic pulls that Transactor out into its own process. There’s only ever one of these, and any write operations go through it.

The Query engine gets moved into the client process. It pulls raw data directly from storage, then does the actual querying and filtering locally.

Instead of being “dumb” clients, these processes are now part of the Datomic network, so instead of “clients” they’re called “peers”.

Peers read data directly from storage, and then cache it locally, making queries very fast. Peers are also notified by the transactor of any new data, so they’re always up to date.

This peer model does have two downsides. The first is that the peer library is only available on the JVM. When you’re using Clojure that’s not an issue, but not everyone is so lucky.

Secondly peers require a lot of memory, since they’ll be holding large parts of the database in memory. This isn’t suitable for all applications.

Because of this Cognitect introduced a new component called the Peer server. It connects to storage and to the transactor like other peers, but then exposes a HTTP Client API, granting access to Datomic from thinner clients.

For the actual physical storage of bytes Datomic relies on pre-existing solutions, rather than re-inventing the wheel. There are a number of different storage backends available, depending on your needs and deployment strategy. Amazon DynamoDB and good old PostgreSQL are popular options.

The “facts” that datomic stores are a kind of five-tuples called datoms. Of these five elements the first three are the most important, they are called the “entity id”, the “attribute” and the “value”, and together they state a proposition like “‘JOHN’ has as ‘EMAIL’ ‘JOHN@PUPPY.WORLD’”.

A proposition like this however does not constitute a fact. If I say “the temperature in Berlin is 25 degrees celcius”, then that’s not a fact, because at some points in time this will be true, and at other times it won’t. So to make it into a fact you need some kind of timestamp.

This is what the fourth element is for. It’s called the transaction id, and it allows Datomic to find out when this datom was created. So together these first four elements state that from a certain point in time onwards, this fact is true.

A fact that was added to datomic may become false at a later point in time, so you need to be able to retract a fact that was previously added.

This is what the final element is for. It’s a boolean flag, which indicates whether this datom represents an addition or a retraction of a fact.

 entity   attribute          value         transaction   added?
   🡓         🡓                 🡓               🡓          🡓
[ 98431  :user/email   "john@puppy.world"   137394134    true ]

So if this user first sets their email address to “john@puppy.world”, then later changes it “john@insect.club”, and then a bit later still deletes the email address entirely, then you might have this sequence of datoms in Datomic.

[ 98431  :user/email   "john@puppy.world"   137300111    true ]
[ 98431  :user/email   "john@insect.club"   137300555    true ]
[ 98431  :user/email   "john@insect.club"   137300999    false ]

In representations the transaction id and “added” flag are often omitted, since you usually care most about the entity, attribute, and value.

[ 98431  :user/email   "john@puppy.world" ]
[ 98431  :user/email   "john@insect.club" ]
[ 98431  :user/email   "john@insect.club" ]

Now let’s take these three datoms. They are all for the same entity, but each describes a different attribute.

When there are multiple datoms all for the same entity, then you can more conveniently represent them in an entity map, mapping attributes to values. The entity id is assigned to the special key :db/id.

;; Datoms
[ 31874  :user/name      "jillosaurus" ]  
[ 31874  :user/email     "jill@insect.club" ]  
[ 31874  :user/location  "Bug, Bamberg" ]  

;; Entity map
{:db/id          31874
 :user/name      "jillosaurus"   
 :user/email     "jill@insect.club"   
 :user/location  "Bug, Bamberg"}

In this episode I’m going to use “Datomic Free”. This edition can be freely downloaded without having to register, and the peer library is even available from Clojars, so you can load it directly with Leiningen or Boot.

The “free” edition does have some limitations. The only storage backends available are in-memory storage, or a file-system based store that’s suitable for tinkering and small hobby projects.

Instead you can also register to get the “Datomic Pro Starter Edition”. This is still free and it gives you access to all the same features as the Pro or Enterprise version. Updates are technically limited to one year, but you can keep using the software afterwards without limitations.

Create a new Clojure project and add datomic-free as a dependency to the project.clj. This JAR contains Datomic’s “Peer Library”, allowing you to communicate with the Transactor, and to read from storage.

(defproject datomic-quick-start "0.1.0-SNAPSHOT"
  :description "Datomic Quick Start"
  :url "https://github.com/lambdaisland/datomic-quick-start"
  :license {:name "Mozilla Public License 2.0"
            :url "https://www.mozilla.org/en-US/MPL/2.0/"}
  :dependencies [[org.clojure/clojure "1.9.0-beta2"]
                 [com.datomic/datomic-free "0.9.5561.59"]])

If you had a look at the official documentation you probably first came across examples that use datomic.client. This is for when you are using the REST API with a peer server. I’m not going to cover that.

Instead I’ll use the peer library directly, by requiring the datomic.api namespace, which is typically aliased to d.

I’m going to use an in-memory database. This way you don’t have to run a separate transactor process. It does mean that anything you store will be lost when you exit the process.

If you do want the changes to persist then you need to use the bin/transactor script to start a transactor, passing it a properties file. There’s an example properties file included with Datomic free that works out of the box. Afterwards you connect to the URI that it shows you.

But like I said I’m actually going to skip that step and use an in-memory database, since this is the easiest way to get a feel for Datomic.

This is what an in-memory URI looks like, with the last part being the name you chose for the database.

Since this is the first time you’re using it you have to create the database first with create-database. After that you can connect to it, and get a connection object back.

If you want a clean slate again then delete the database, and create and connect to it again.

(ns datomic-quick-start.core
  (:require [datomic.api :as d]))

(def db-uri "datomic:mem://quick-start-db")

;;(d/delete-database db-uri)
(d/create-database db-uri)

(def conn (d/connect db-uri))

Now let’s try to add some data, before you do that though you need to tell Datomic about the attributes you’ll be using, together with their types. They need to be defined before they can be used.

Take this example from earlier. It makes use of three attributes, :user/name, :user/email, and :user/location, which are all of type string. Here you see it again both as individual datoms, and as the equivalent entity map.

;; Datoms
[ 31874  :user/name      "jillosaurus" ]  
[ 31874  :user/email     "jill@insect.club" ]  
[ 31874  :user/location  "Bug, Bamberg" ]  

;; Entity map
{:db/id          31874
 :user/name      "jillosaurus"   
 :user/email     "jill@insect.club"   
 :user/location  "Bug, Bamberg"}

Attributes are themselves entities. This probably sounds very meta and abstract, but it’s part of what makes Datomic’s design so interesting. All of a database’s meta-information is stored in the database right along your own data.

To define an attribute you need to provide at least three things: an identifier, which is typicaly a namespaced keyword, the value type of the attribute, in this case string, and the cardinality. A user only has one name, but a user’s hobbies could have :db.cardinality/many.

You can also optionally provide a docstring for your attributes, which is generally a good idea. I’m also going to set a uniqueness constraint on this attribute, this automatically adds an index so you can look up users by user name later on. For more information on defining a Datomic schema please refer to the docs.

I’m using an entity map to create this attribute. Notice that there’s no :db/id in here, this is fine, Datomic will pick an entity id when the entity is created.

{;; :db/id       chosen by datomic
 ;; :db/doc      "optional docstring"
 :db/ident       :user/name
 :db/valueType   :db.type/string
 :db/cardinality :db.cardinality/one}

Now you can send this entity map to the transactor, so it can be written to the database, using the transact function, passing it a connection object, and a list of entity maps.

(d/transact conn [{:db/ident       :user/name
                   :db/doc         "The unique username of a user."
                   :db/valueType   :db.type/string
                   :db/cardinality :db.cardinality/one
                   :db/unique      :db.unique/identity}])

With that :user/name attribute defined it’s time to create some users! Let’s make another transaction, this time passing it two entity maps, one for each user. Assign the result to a var, so you can take it apart to see what’s in there.

(def tx-result
  (d/transact conn [{:user/name    "jillosaurus"}
                    {:user/name    "jonnyboy"}]))

The result of transact is a promise, a kind of reference type, so to get its contents you will have to “deref” it.

(class tx-result)
;;=> datomic.promise$settable_future$reify__5815

After dereferencing, the result of a transaction behaves like a Clojure map with four keys.

:db-before and :db-after are “databases” or database values. You can best think of them as snapshots. They represent the state of the world at a given point in time. :db-before is how things were before the transaction was processed, and :db-after is how things are afterwards.

So you could query :db-after and find an entity with username “jillosaurus”, which you won’t find in the :db-before.

I know I’m jumping ahead here, I’ll explain queries in more detail in a bit. The thing to note is that when looking for “jillosaurus” in :db-after you get a single entity id back, whereas in the :db-before it doesn’t find anything, since the user hasn’t been created yet.

(keys @tx-result)
;;=> (:db-before :db-after :tx-data :tempids)

(:db-before @tx-result)
;;=> datomic.db.Db@1e985da6

(:db-after @tx-result)
;;=> datomic.db.Db@ad344ee4

(d/q '[:find ?e :where [?e :user/name "jillosaurus"]] (:db-after @tx-result))
;;=> #{[17592186045418]}

(d/q '[:find ?e :where [?e :user/name "jillosaurus"]] (:db-before @tx-result))
;;=> #{}

It’s interesting and useful that calling transact returns these two database values, but most of the time you just want the state of the database the way it is “right now”. For this you use the d/db function.

Keep in mind that this database value still represents a snapshot. Once you have a database value it never changes. To see changes that happened later on you need to call d/db afresh. This is an extremely common mistake when getting started with datomic.

(d/db conn)
;;=> datomic.db.Db@4e0454d1

(d/q '[:find ?e :where [?e :user/name "jillosaurus"]] (d/db conn))

Datomic Quickstart, part 1