Lambda Island

Seqs are a key abstraction in how Clojure processes data, but what exactly is a “seq”, and how does it relate to “seqable?”. Time for a deep dive in Clojure and ClojureScript source code to discover the finer points. Learn exactly what seqs and seqables are, what they do, and how you can use them to your advantage.

Transcript

Clojure 1.9 introduced a new function called seqable?, which seemed like a good opportunity to have a closer look at what exactly a seq is, and what it does, both in Clojure and in Clojurescript.

When modeling data in Clojure you mostly use Clojure’s persistent functional data structures: vectors, maps, sets, and occasionally lists. This may seem obvious now, but it’s a big departure from LISP tradition. Many Lisps only have one of these: the singly linked list.

Having a single type may seem limiting, but it does have an important advantage:all data can be processed in the same way. In other words, you get a uniform interface, making it easy to write general purpose functions that work with a wide range of data.

;; Clojure - specialized collection types
[1 2 3]
{:a :b :c :d}
#{"red" "yellow" "blue"}

;; LISP - lists are all you'll ever need
'(1 2 3)
'((:a :b) (:c :d))
'("red" "yellow" "blue")

To process a list you only need two functions, one to get the first element of the list, and one to get the rest of the list. We know these as first and rest, originally they were called car and cdr, really for no good reason whatsoever.

With these you can deconstruct a list incrementally, one piece at a time, which naturally leads to recursive functions.

To construct a list in the same way we need a way to add one piece back on to it, which is done with cons.

If everything is a list, then first, rest, and cons are all you’ll ever need to handle data.

user> (def x (list 1 2 3))
#'user/x
user> (first x)
1
user> (rest x)
(2 3)
user> (cons 0 x)
(0 1 2 3)
user>

Not everything in Clojure is a list though. To have its cake and eat it too, Clojure introduced seqs. There are many types of seqs. Plain lists are seqs, they are very similar to the lists from Lisp, but they also keep track of their count, so you can get the length of the list in constant time.

Functions like map or filter return “lazy” seqs. This means the values in the end result are only computed as they are needed. Other things like maps and vectors are not seqs, this is something to keep in mind.

user> (type '(1 2 3))
clojure.lang.PersistentList
user> (seq? '(1 2 3))
true

user> (type (map identity '(1 2 3)))
clojure.lang.LazySeq
user> (seq? (map identity '(1 2 3)))
true

user> (type {:a :b})
clojure.lang.PersistentArrayMap
user> (seq? {:a :b})
false

Some functions like range, vals, or keys return their own specific type of seq.

The good news is that you don’t have to care what type they are, they all support the same basic operations. They behave the same, because they all implement the ISeq interface. Sensitive viewers please be warned, in what follows we’ll be looking at some Java code.

user> (type (range 100))
clojure.lang.LongRange
user> (seq? (range 100))
true
user> (type (vals {:a :b :c :d}))
clojure.lang.APersistentMap$ValSeq
user> (seq? (vals {:a :b :c :d}))
true

This is the ISeq interface. An interface is a bit like a protocol in Clojure. Note how this interface defines four methods, but it doesn’t provide an implementation for those methods, that part is left up to the concrete seq implementations.

first, and next should look familiar, as well as cons. more is the equivalent of Clojure’s rest. This shows that the methods you see here are not the same as the first, next and rest functions you would call from Clojure. They’re related though. These Java methods are called under the hood by their counterparts defined in clojure.core.

/**
 *   Copyright (c) Rich Hickey. All rights reserved.
 *   The use and distribution terms for this software are covered by the
 *   Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
 *   which can be found in the file epl-v10.html at the root of this distribution.
 *   By using this software in any fashion, you are agreeing to be bound by
 * 	 the terms of this license.
 *   You must not remove this notice, or any other, from this software.
 **/

package clojure.lang;

/**
 * A persistent, functional, sequence interface
 * <p/>
 * ISeqs are immutable values, i.e. neither first(), nor rest() changes
 * or invalidates the ISeq
 */
public interface ISeq extends IPersistentCollection {

    Object first();

    ISeq next();

    ISeq more();

    ISeq cons(Object o);

}

Seqs predate clojure protocols, and so they use Java interfaces instead. This isn’t likely to change, both for backwards compatibility, and for performance reasons. It’s hard to beat plain Java method dispatch.

Clojurescript on the other hand got to start with a clean slate, and did decide to implement seqs as protocols. This is great, because it means you can extend the protocol to existing JavaScript objects, making them behave like seqs. This is something that isn’t possible with Java’s interfaces.

The function names in these protocols start with a dash, which is a common pattern for protocols in cljs.core. The first and rest functions will again dispatch internally to -first and -rest based on the type of their argument.

(defprotocol ISeq
  "Protocol for collections to provide access to their items as sequences."
  (-first [coll]
    "Returns the first item in the collection coll. Used by cljs.core/first.")
  (^clj -rest [coll]
    "Returns a new collection of coll without the first item. It should
     always return a seq, e.g.
     (rest []) => ()
     (rest nil) => ()"))

(defprotocol INext
  "Protocol for accessing the next items of a collection."
  (^clj-or-nil -next [coll]
    "Returns a new collection of coll without the first item. In contrast to
     rest, it should return nil if there are no more items, e.g.
     (next []) => nil
     (next nil) => nil"))

So seqs are great, because they allow us to handle many different types of collections in the same way, but vectors, sets, and maps aren’t seqs, they don’t implement the ISeq interface, and yet we treat them just the same. This is thanks to a handy little function called, perhaps unsurpisingly, seq.

This is the definition of seq. From the metadata you can tell it always returns an ISeq. There’s not much code because the real implementation is again in Java, in a special class called clojure.lang.RT, short for Runtime.

seq does its best to turn whatever you give it into a seq. Functions like first, rest, map, or filter all use seq internally to convert the argument to a seq if it isn’t one already. Anything that can be converted to a seq this way is considered “seqable”.

(def ^{:arglists '(^clojure.lang.ISeq [coll])
       :doc "Returns a seq on the collection. If the collection is
    empty, returns nil.  (seq nil) returns nil. seq also works on
    Strings, native Java arrays (of reference types) and any objects
    that implement Iterable. Note that seqs cache values, thus seq
    should not be used on any Iterable whose iterator repeatedly
    returns the same mutable object."}
  seq
  (fn ^clojure.lang.ISeq [coll]
    (.seq clojure.lang.RT coll)))

To see how that works in practice lets have a look at first. Here’s the implementation. Uh yeah, this one delegates its work to clojure.lang.RT again, so let’s look there.

(def
 ^{:arglists '([coll])
   :doc "Returns the first item in the collection. Calls seq on its
    argument. If coll is nil, returns nil."
   :added "1.0"
   :static true}
 first (fn ^:static first [coll] (. clojure.lang.RT (first coll))))

If the object passed in is already a seq, then first know that it can call the first method from the ISeq interface.

If not then first uses the seq function to try to convert its argument to a seq.

If the collection is empty or nil, then the seq function returns nil, which in Java is called null, that’s why this check is needed.

Now that the argument has been converted to a seq, it’s safe to call the first method, which will invoke the relevant code for the specific type of seq.

static public Object first(Object x){
    if (x instanceof ISeq) {
        return ((ISeq) x).first();
    }

    ISeq seq = seq(x);

    if (seq == null) {
        return null;
    }

    return seq.first();
}

So what can be turned into a seq? Quite a lot of things, it turns out.

The first check is simply to see if the argument is already a concrete seq, if so then no coversion is needed.

If the collection is lazy, then calling seq on it will “realize” it, computing the next chunk of values.

All Clojure collections implement a one-method interface called Seqable, containing a single seq method. This is how things like vectors and sets get turned into seqs.

seq of nil is simply nil.

In Java-land instead of seqs and seqables there are iterators and iterables. If an object is iterable then it is also seqable, in other words, Java collection types can be used transparently in Clojure. The same goes for Java native arrays.

Strings and string-like things like buffers and stringbuilders are also seqable, in which case you get a seq of the individual characters in the string.

The final check is for Java’s associative collections, which all implement the java.util.Map interface.

static ISeq seqFrom(Object coll){
    if(coll instanceof Seqable)
        return ((Seqable) coll).seq();
    else if(coll == null)
        return null;
    else if(coll instanceof Iterable)
        return chunkIteratorSeq(((Iterable) coll).iterator());
    else if(coll.getClass().isArray())
        return ArraySeq.createFromObject(coll);
    else if(coll instanceof CharSequence)
        return StringSeq.create((CharSequence) coll);
    else if(coll instanceof Map)
        return seq(((Map) coll).entrySet());
    else {
        Class c = coll.getClass();
        Class sc = c.getSuperclass();
        throw new IllegalArgumentException("Don't know how to create ISeq from: " + c.getName());
    }
}

In Clojure Seqable is an interface again, so you can’t add support for it to existing classes, although if you were to implement new data types from scratch, you could use it to make them instantly Clojure compatible.

Luckily Java iterables are also seqable, so most collection types from Java or other JVM languages will work out of the box with Clojure, even when provided by third party libraries.

/**
 *   Copyright (c) Rich Hickey. All rights reserved.
 *   The use and distribution terms for this software are covered by the
 *   Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
 *   which can be found in the file epl-v10.html at the root of this distribution.
 *   By using this software in any fashion, you are agreeing to be bound by
 * 	 the terms of this license.
 *   You must not remove this notice, or any other, from this software.
 **/

/* rich Jan 28, 2009 */

package clojure.lang;

public interface Seqable {
    ISeq seq();
}

In ClojureScript there’s ISeqable, which is a protocol containing a single -seq function. This one you can extend to work with types from third party libraries.

(defprotocol ISeqable
  "Protocol for adding the ability to a type to be transformed into a sequence."
  (^clj-or-nil -seq [o]
    "Returns a seq of o, or nil if o is empty."))

For example, the Phaser game library contains a Phaser.Group type which is a kind of collection. I can extend Phaser.Group to implement ISeqable. By providing a custom implementation for -seq it’s now possible to use Group objects with first, map, or doseq, just like ClojureScript collections.

(extend-type js/Phaser.Group
  ISeqable
  (-seq [group]
    (let [children (.-children group)]
      (map #(aget children %) (range (.-length children))))))

There’s one aspect we’ve glossed over so far, which is the distinction between nil and the empty list. In LISP these are one and the same thing. Writing an empty list is simply a different notation for nil.

This is convenient, because it means that checking if a list is empty, such as for a termination clause of a recursive function, comes down to doing a nil-check.

 % rlwrap sbcl
This is SBCL 1.3.1.debian, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
* nil

NIL
* ()

NIL
*

In Clojure the empty list and nil are two separate concepts, so you would need two checks instead of one. The seq abstraction tries to collapse these though. When calling seq on the empty list the result is nil. This is why Clojure doesn’t have a dedicated function to check for emptiness. Instead you use seq.

You can also treat nil as an empty list and add values onto it, or pass it to any other function that works with seqables, like first or map, and it will be treated as an empty list.

user> (seq nil)
nil
user> (seq ())
nil
user> (conj nil 1)
(1)
user> (type (conj nil 1))
clojure.lang.PersistentList
user> (map identity nil)
()
user> (first nil)
nil
user>

The point of this episode has been to encourage you to challenge your assumptions. Even basic and innocious looking concepts can have hidden depths. Doing a deep dive like this can help to adjust your mental model of how Clojure works, which ulimately makes you a better coder.

Show Notes

Things that are seq?

nil
lists, '(1 2 3)
conses, (cons 0 '(1 2 3))
lazy sequences, e.g. the result of map, filter, remove, for

Things that are seqable?

any seq
instances of Seqable (Clojure) or ISeqable (ClojureScript)
Java and Javascript arrays
instances of java.lang.Iterable (most Java collections)
Strings (also StringBuffer and StringBuilder)
instances of java.lang.Map (Java associative collections)

A note about `cons` and `conj`

This episode makes frequent mention of cons, since it is one of the core operations implemented on seq. In idiomatic Clojure code however cons is rarely used, instead use conj.

What are the differences?

with cons the item to be added comes first, with conj it comes last
conj can add more than one thing at a time
cons will convert the collection to a seq, and return a seq, conj returns the same type as the input collection
cons always prepends the element, with conj it depends on the collection. For lists it prepends, when using a vector it adds to the end.

(cons 1 [4 5 6]) ;;=> (1 4 5 6)
(conj [4 5 6] 1 2 3) ;;=> [4 5 6 1 2 4]

Seq and Seqable