7. Intro to clojure.spec

Freebie

Published 08 June 16

The spec library which will be included in Clojure 1.9 gives you a powerful mechanism for validating data. Find out how to use it in this episode.

Around the web

From the horse’s mouth, the official sources on clojure.spec:

Blogs and further reading:

Variants

The conformed value of s/alt or s/or is known as a “variant”

(s/conform (s/alt :n number? :k keyword?) [:a])
;;=> [:k :a]

(s/conform (s/or :n number? :k keyword?) 5)
;;=> [:n 5]

These work really well in combination with core.match.

Two weeks ago Rich Hickey announced a major new library for Clojure, called clojure.spec.

A lot of people are excited about spec, and rightfully so. Let’s see what spec can do for you. I’ll just start with an empty project and project.clj file.

~ $ mkdir robochef
~ $ cd robochef
~/robochef $ touch project.clj
~/robochef $

Clojure.spec will be bundled with the upcoming 1.9 release of Clojure. We can already try it out today by using the alpha version.

(defproject robochef "0.1.0-SNAPSHOT"
  :dependencies [[org.clojure/clojure "1.9.0-alpha4"]])

Require spec in your namespace, and get a REPL going. So far so good.

(ns robochef.core
  (:require [clojure.spec :as s]))

In the age of the Internet of Things, we are building a robot chef that can cook delicious meals straight from the cloud. At the heart of our system is a innocious-looking function, cook.

(defn cook! [recipe]
  ,,,)

We pass our Robochef a recipe, consisting of ingredients and steps. Here’s an example.

This looks simple enough, it’s a nice and clean interface, something even a stay at home dad can manage. Still, as we’re building this a nagging feeling of unease creeps up on us. What if you pass the chef an invalid recipe? Will it politely mention that something’s amiss, or will it start hurling eggs at the cloud-connected fridge?

(def recipe
  {::ingredients [1 :kg "aubergine"
                  20 :ml "soy sauce"]
   ::steps ["fry the aubergines"
            "add soy sauce"]})

(cook! recipe)

Clearly we need a way to validate recipes, so they can’t make it into our system and wreak havoc. This is where clojure.spec comes in.

The s/valid? function takes two arguments. The first argument is the “spec”, here we pass it the simplest kind of spec, a function which returns true or false, also known as a predicate function. The second argument is the value we want to validate.

valid? will also return true or false, so if we wrap that in assert our function will fail early when given invalid input, which is what we want.

(defn cook! [recipe]
  (assert (s/valid? map? recipe))
  ,,,)

Besides valid?, spec also provides the function conform. If the value is valid then it is simply returned, although perhaps in a slightly different form.

If the value is invalid, then the special symbol :clojure.spec/invalid is returned.

Don’t worry too much about conform yet, we’ll get back to it later.

(s/conform map? {::ingredients []})
;;=> {:spec-test.core/ingredients []}

(s/conform map? "foo")
;;=> :clojure.spec/invalid

Our recipe validation doesn’t amount to much yet. Let’s improve it a bit. Predicate specs are only the beginning, the smallest building blocks, clojure.spec provides a whole bunch of functions and macros to create more complicated specs.

Recipes have to be maps, with keywords for keys, and vectors for values. We can express this with a map-of spec.

This still allows all kinds of gibberish to be passed in, but at least it’s already stopping some nonsense from coming through.

(defn cook! [recipe]
  (assert (s/valid? (s/map-of keyword? vector?) recipe))
  ,,,)

(cook! {::ingredients "chunky garlic"})
;; => 1. Unhandled java.lang.AssertionError
;;       Assert failed: (s/valid? :spec-test.core/recipe recipe)

Clojure maintains a global registry of specs, so you can refer to them simply by name, which must be a namespace qualified keyword. There’s a good chance we’ll have to validate recipes in other places as well, so let’s register our budding spec with s/def.

(s/def ::recipe (s/map-of keyword? vector?))

(defn cook! [recipe]
  (assert (s/valid? ::recipe recipe))
  ,,,)

The steps of a recipe are a vector of strings, that’s also easy to spec with coll-of. This says that steps is a collection of strings, more specifically a vector. Now we can add an extra assertion to check the recipe’s steps.

(s/def ::recipe (s/map-of keyword? vector?))
(s/def ::steps (s/coll-of string? []))

(defn cook! [recipe]
  (assert (s/valid? ::recipe recipe))
  (assert (s/valid? ::steps (::steps recipe)))
  ,,,)

Using coll-of is fine for our steps, but the ingredients don’t form a homogenous collection, we will need a more powerful tool to validate those.

Ingredients form a sequence of triplets, an amount, which is a number, a unit of measurement, expressed as a keyword, and the name of the ingredient, which is a string.

In computer science, we have a generic mechanism to validate and match sequences, known as regular expressions. Regexes are most commonly used to match sequences of characters, or strings, but spec allows you to write regexes for arbitrary sequences of values. That’s pretty cool, right!

(s/def ::ingredients (s/* ::ingredient))
(s/def ::ingredient (s/cat :amount number?
                           :unit keyword?
                           :name string?))

Let’s dive into that a little more. Clojure.spec provides five operators for constructing regular expressions over sequences.

The asterisk, plus sign, and question mark are known as “quantifiers”, these also exist in regular expressions for strings, and they have the exact same function. The asterisk matches zero or more items, the plus sign matches one or more items, and the question mark matches at most one item.

So in this case any number of keywords in a sequence will match, including none at all, but if there’s something else in there it won’t be valid.

;; #"k*"
(s/valid? (s/* keyword?) [])         ;;=> true
(s/valid? (s/* keyword?) [:a])       ;;=> true
(s/valid? (s/* keyword?) [:a :b :c]) ;;=> true
(s/valid? (s/* keyword?) [:a 5 :c])  ;;=> false

Plus is almost identical, except that now the empty vector is no longer valid, you need at least one item to be present.

;; #"k+"
(s/valid? (s/+ keyword?) [])         ;;=> false
(s/valid? (s/+ keyword?) [:a])       ;;=> true
(s/valid? (s/+ keyword?) [:a :b :c]) ;;=> true
(s/valid? (s/+ keyword?) [:a 5 :c])  ;;=> false

The question mark is for marking something as optional. It either matches nothing at all, or a single item, but never more than that.

;; #"k?"
(s/valid? (s/? keyword?) [])         ;;=> true
(s/valid? (s/? keyword?) [:a])       ;;=> true
(s/valid? (s/? keyword?) [:a :b :c]) ;;=> false

The cat operator lets you say: “first this, then that, then something else”. It lets you combine any number of predicates and specs in a specific order.

Here we’re matching any sequence of two elements, consisting of a number followed by a keyword.

We have to name these parts that we’re matching, here I’m just calling them :num and :key.

;; #"nk"
(s/valid? (s/cat :num number?
                 :key keyword?) [5 :a]) ;;=> true

(s/valid? (s/cat :num number?
                 :key keyword?) [:a 5]) ;;=> false

(s/valid? (s/cat :num number?
                 :key keyword?) [5 :a :b]) ;;=> false

Now it’s time to have another look at conform, remember that it either return a “conformed” value, or the symbol :clojure.spec/invalid

(s/conform (s/* number?) [1 2 3])
;;=> [1 2 3]
(s/conform (s/* number?) [1 2 "a"])
;;=> :clojure.spec/invalid

For most specs we’ve seen so far, the conformed output will be equal to the input, but cat is different. Instead of conforming to the input sequence, it conforms to a result map, built up using the keys we specified.

(s/conform (s/cat :num number?
                  :key keyword?) [5 :a])
;;=> {:num 5, :key :a}

Now we should be able to understand the spec for ingredients. A single ingredient consists of a quantity, unit, and name. The ingredient list is a succession of any number of ingredients.

Now by using conform, not only do we know if the ingredient list is valid, but we get it back in a format that’s much more suitable for further processing.

(s/def ::ingredient (s/cat :quantity number?
                           :unit keyword?
                           :name string?))

(s/def ::ingredients (s/* ::ingredient))

(s/conform ::ingredients [4 :g "Monkey-picked Anxi Wulong Tea"
                          200 :ml "Boiling water"])
;; => [{:quantity 4,   :unit :g,  :name "Monkey-picked Anxi Wulong Tea"}
;;     {:quantity 200, :unit :ml, :name "Boiling water"}]

There’s one more regex operator we haven’t covered yet, which is alt. It’s the equivalent of the vertical bar (or “pipe”) character in normal regex syntax, it lets you express a choice. Just as with cat, we are forced to pick names for the possible alternatives, but again this comes in handy when using conform.

In this case the conformed value isn’t a map, but it’s a vector, containing the key and the matched value. This style of tagging values is known as “variants”, and there was a whole talk about it at Clojure/conj two years back. I’ll add a link to the show notes.

;; #"(n|k)"
(s/conform (s/alt :num number?
                  :key keyword?) [5])
;;=> [:num 5]

(s/conform (s/alt :num number?
                  :key keyword?) [:a])
;;=> [:key :a]

(s/conform (s/alt :num number?
                  :key keyword?) ["b"])
;;=> :clojure.spec/invalid

One thing that takes a little bit of getting used to is that you can combine and nest any of these regex operators, and they will still just match a single sequence.

So even though in this example we’re nesting the + operator inside cat, it won’t match nested vectors in the input.

This distinction doesn’t come up when using regular expressions over strings, because you can’t have a string inside a string, the way that you can have a vector inside a vector. A string is always a flat sequence of characters.

;; #"n+k+"
(s/valid? (s/cat :nums (s/+ number?)
                 :keys (s/+ keyword?)) [5 6 7 :a :b :c]) ;;=> true

(s/valid? (s/cat :nums (s/+ number?)
                 :keys (s/+ keyword?)) [[5 6 7] [:a :b :c]]) ;;=> false

(s/conform (s/cat :nums (s/+ number?)
                  :keys (s/+ keyword?)) [5 6 7 :a :b :c])
;;=> {:nums [5 6 7], :keys [:a :b :c]}

To match a nested collection inside a regex spec, you have two options. You can use a non-regex spec that describes the nested collection. In this example the inner vectors are each validated with coll-of, and we use cat to say that the one with numbers comes first, followed by the one with keywords.

(s/conform (s/cat :nums (s/coll-of number? [])
                  :keys (s/coll-of keyword? [])) [[5 6 7] [:a :b :c]])
;;=> {:nums [5 6 7], :keys [:a :b :c]}

If both the inner and outer collections are validated with regex specs, then you need some way to specify where the nesting starts. You can do this by wrapping a regex spec in a call to s/spec. This way the regex matching will start anew on the current item.

(s/conform (s/cat :nums (s/* number?)
                  :keys (s/* keyword?)) [5 6 7 :a :b :c])
;;=> {:nums [5 6 7], :keys [:a :b :c]}

(s/conform (s/cat :nums (s/spec (s/* number?))
                  :keys (s/spec (s/* keyword?))) [[5 6 7] [:a :b :c]])
;;=> {:nums [5 6 7], :keys [:a :b :c]}

Back to our recipes. We have specs set up for ingredients and steps, now we can tie them together to validate recipes in one steps, and thanks to the power of conform we can process the result much more easily.

The only missing “ingredient” (if you will) is the s/keys function. Before we used map-of the validate recipes, but map-of doesn’t tell you anything about what values can be used for specific keys.

We pass keys two keyword arguments: a list of keys which is required to be present, and a list of optional keys. We’re making steps optional, if it’s ommitted the robochef will simply mix all the ingredients together.

These lists of required and optional keys again use namespace-qualified keywords. They serve a double purpose: spec will look for these keywords as keys in the map, and it will look for a registered spec with that name same name to validate the corresponding value.

(s/def ::recipe (s/keys :req [::ingredients]
                        :opt [::steps]))

(s/def ::ingredient (s/cat :quantity number?
                           :unit keyword?
                           :name string?))

(s/def ::ingredients (s/* ::ingredient))

(s/def ::steps (s/coll-of string? []))

(def recipe
  {::ingredients [1 :kg "aubergine"
                  20 "ml" "soy sauce"]
   ::steps ["fry the aubergines"
            "add soy sauce"]})

(s/conform ::recipe recipe)

Now we can validate our recipe in a single step again. If a user passes in an invalid recipe though, then the bland AssertionError they’re getting isn’t very helpful.

The nice thing is we can ask spec to explain itself. The explain function will generate an error message, and print it to stdout. If you want to get the message as a string, you can use explain-str, whereas with explain-data you get a data structure containing the same information.

These error messages are a bit hard to read, I really hope they get better before the final release, but at least all the information is there.

(s/explain-str ::recipe {::steps [5]})
;;=> "val: {:robochef.core/steps [5]} fails predicate: [(contains? % :robochef.core/ingredients)]\nIn: [:robochef.core/steps] val: [5] fails spec: :robochef.core/steps at: [:robochef.core/steps] predicate: (coll-checker string?)\n"

(s/explain-data ::recipe {::steps [5]})
;;=> {:clojure.spec/problems {[] {:pred [(contains? % :robochef.core/ingredients)], :val {:robochef.core/steps [5]}, :via [], :in []}, [:robochef.core/steps] {:pred (coll-checker string?), :val [5], :via [:robochef.core/steps], :in [:robochef.core/steps]}}}

Now let’s use this to throw a more descriptive exception.

(defn cook! [recipe]
  (if-not (s/valid? ::recipe recipe)
    (throw (ex-info (s/explain-str ::recipe recipe)
                    (s/explain-data ::recipe recipe))))
  ,,,)

I hope with this episode I managed to give you a taste of clojure.spec. There’s a lot more to talk about, more ways to create specs, custom conformers, instrumenting functions and macros, and doing generative testing. If you want more comprehensive coverage I suggest you start with the official guide on clojure.org. There have also been a number of good blog posts on the subject already, I’ll leave a couple of links in the show notes.