(→ Stranded Between Parens)

The Lambda Island Blog

Clojure Gotchas: "contains?" and Associative Collections

When learning a programming language we rarely read the reference documentation front to back. Instead someone might follow some tutorials, and look at sample code, until they’re confident enough to start a little project for practice.

From that point on the learning process is largely “just in time”, looking up exactly the things you need to perform the task at hand. As this goes on you might start to recognize some patterns, some internal logic that allows you to “intuit” how one part of the language works, based on experience with another part.

Developing this “intuition” — understanding this internal logic — is key to using a language effectively, but occasionally our intuition will be off. Some things are simply not obvious, unless someone has explained them to us. In this post I will look at something that frequently trips people up, and attempt to explain the underlying reasoning.

contains?

Let’s start with perhaps the greatest source of frustration in clojure.core, the contains? function. Based on its name you’d be excused for thinking that it checks whether a collection contains a certain value.

(def beatles ["John" "Paul" "Ringo" "George"])

(contains? beatles "Ringo")
;;=> false

Ahum… what the h*ck, Clojure?

A symptom of the same underlying cause is confusion over get vs nth. Why do we need both? When you do you use which?

(get beatles 1) ;;=> "Paul"
(nth beatles 1) ;;=> "Paul"

Collection Traits

To understand what’s going on we need to look at the underlying abstractions to Clojure’s data types. All of Clojure’s concrete types like vectors, sets, or maps implement one or more of these “abstractions”. You can also think of them as “having certain traits”.

  • Seq(uential)
  • Seqable
  • Associative
  • Counted
  • Indexed
  • Sorted
  • Reversible

Functions in clojure.core do their work based on these abstractions. For instance, you can call first on any Clojure collection because they all implement Seq or Seqable, whereas get or update will only work on collections that implement Associative.

This blog post by Alex Miller explains these traits in great detail.

Of these Seq and Seqable are arguable the most important to understand, closely followed by Associative. There’s a free Lambda Island episode with everything you need to know about seqs, which is a trait shared by all Clojure collections, as well as some other things like strings or Java arrays.

(seqable? '(1 2))  ;;=> true
(seqable? [1 2])   ;;=> true
(seqable? {1 2})   ;;=> true
(seqable? #{1 2})  ;;=> true
(seqable? "12")    ;;=> true

Functions like first, rest, map, or cons can operate on any seq/seqable. That’s also why the return type of map or cons may be different from what you passed it. The only guarantee is that the result is again a seq.

Maps and vectors are associative. You can think of the first as a mapping from keys to values, and the second as a mapping from indexes to values. get, assoc, dissoc, update, and any of their *-in variants work on associative data structures.

(associative? '(1 2))  ;;=> false
(associative? [1 2])   ;;=> true
(associative? {1 2})   ;;=> true
(associative? #{1 2})  ;;=> false
(associative? "12")    ;;=> false
(get {1 2} 1) ;;=> 2
(get [1 2] 0) ;;=> 1

(update-in {:x {:y [0 0 0]}} [:x :y 1] inc)
;;=> {:x {:y [0 1 0]}}

contains? works on Associative collections, where it checks if the collection contains a mapping for the given input. For maps that means it will check if a given key is there, and for vectors it returns true if the index exists.

(contains? {:a :b} :a)                         ;;=> true
(contains? {:a :b} :c)                         ;;=> false

(contains? [:a :b] 0)                          ;;=> true
(contains? [:a :b] 1)                          ;;=> true
(contains? [:a :b] 2)                          ;;=> false

Sets as funny maps

Sets are a bit of a special case. While sets are not associative, several of the associative operations, including contains?, will also work on sets.

Suppose that Clojure didn’t have sets, in that case you could make do by using maps, with each value identical to its key. It turns out that for many operations that is exactly how sets behave. Whenever I’m not sure of whether a set would work in a given context, I try to imagine what would happen if I used a map instead.

(def real-set #{:a :b})
(def pseudo-set {:a :a, :b :b})

(real-set :a)                                  ;;=> :a
(pseudo-set :a)                                ;;=> :a

(get real-set :a)                              ;;=> :a
(get pseudo-set :a)                            ;;=> :a

(contains? real-set :a)                        ;;=> true
(contains? pseudo-set :a)                      ;;=> true

Now that you know exactly how contains? works, you can attempt to make sense of its docstring.

cljs.user=> (doc contains?)
-------------------------
cljs.core/contains?
([coll v])
  Returns true if key is present in the given collection, otherwise
  returns false.  Note that for numerically indexed collections like
  vectors and arrays, this tests if the numeric key is within the
  range of indexes. 'contains?' operates constant or logarithmic time;
  it will not perform a linear search for a value.  See also 'some'.
nil

So to recap, for maps contains? checks for the presence of a key, for sets it does what you would expect, since keys=values, and for vectors it checks whether the index is present.

(contains? #{:a :b} :a)                        ;;=> true
(contains? #{:a :b} :c)                        ;;=> false

(contains? {:a :b} :a)                         ;;=> true
(contains? {:a :b} :b)                         ;;=> false

(contains? [1 2] 1)                            ;;=> true
(contains? [1 2] 2)                            ;;=> false

If you do want to search a collection that is “only” a seq, the common idiom is to use some + a set literal. This returns the matched value, which is truthy.

(def beatles ["John" "Paul" "Ringo" "George"])

(some #{"Ringo"} beatles)
;;=> "Ringo"

An important caveat: if the value you’re looking for could be nil or false, then the set literal won’t work, and you need a more explicit check.

(def haystack [false])

(some #{false} haystack)
;;=> nil

(some #(= false %) haystack)
;;=> true

The difference between get and nth is now hopefully also clear. While get does a lookup in an associative collection, nth will find the nth element in a sequential collection. If the collection happens to also be indexed, then nth works in constant time, otherwise it will take linear time to walk the sequence.

;; Vectors: they're roughly the same
(get [:a :b :c] 1)                             ;;=> :b
(nth [:a :b :c] 1)                             ;;=> :b

;; Lists: only nth makes sense
(get '(:a :b :c) 1)                            ;;=> nil
(nth '(:a :b :c) 1)                            ;;=> :b

;; Sets: only get makes sense
(get #{:a :b :c} :a)                           ;;=> :a
(nth #{:a :b :c} 1)                            ;;=> ! Exception

;; Maps: only get makes sense
(get {:a :b} :a)                               ;;=> :b
(nth {:a :b} 1)                                ;;=> ! Exception

This diagram from a presentation by Alex Miller gives you an overview of which functions work on which abstractions.

Collection Traits

This may seem overly complicated, but in practice most of these are pretty intuitive, so you don’t need to hang this diagram over your bed to study it every evening. Some functions “color outside the lines”, like contains? working on Associative collections, but also sets, or nth really being more of an Indexed operation, but also accepting seqs. In practice this isn’t as confusing as it may seem. The main cases that do trip people up I’ve tried to explain above.

Many thanks to Daiyi for providing feedback on an earlier draft.

Try out an interactive version of this post in Maria

🡑 Discuss/vote for this post on Reddit

About the author

Arne divides his time between making Clojure tutorial videos for Lambda Island, and working on open-source projects like Chestnut. He is also available for Clojure and ClojureScript training and mentoring. You can support Arne through his Patreon page.

More blog posts

Dates in Clojure: Making Sense of the Mess

You can always count on human culture to make programming messy. To find out if a person is a programmer just have them say “encodings” or “timezones” and watch their face.

In Java, and hence Clojure, there are half a dozen different ways to represent dates and times, which can lead to confusion, needless type coercion, and inconsistent code. In this post I’ll give you a quick lay of the land, and some tips to make it all a bit easier to deal with.

Representing Time

Clojure Gotchas: Surrogate Pairs

tl;dr: both Java and JavaScript have trouble dealing with unicode characters from Supplementary Planes, like emoji 😱💣.

Today I started working on the next feature of lambdaisland/uri, URI normalization. I worked test-first, you’ll get to see how that went in the next Lambda Island episode.

One of the design goals for this library is to have 100% parity between Clojure and ClojureScript. Learn once, use anywhere. The code is all written in .cljc files, so it can be treated as either Clojure or ClojureScript. Only where necessary am I using a small amount of reader conditionals.

Simple and Happy; is Clojure dying, and what has Ruby got to do with it?

The past week or so a lot of discussion and introspection has been happening in the Clojure community. Eric Normand responded to my one year Lambda Island post with some reflections on the size and growth of the community.

And then Zack Maril lamented on Twitter: “I’m calling it, clojure’s dying more than it is growing”. This sparked a mega-thread, which was still raging four days later. A parallel discussion thread formed on Reddit. Someone asked if their were any Clojure failure stories. They were pointed at this talk from RubyConf 2016, which seemed to hit a lot of people right in the feels, and sparked a subthread with a life of its own.

Meanwhile Ray, one of the hosts of the defn podcast reacted to the original tweet: “I’m calling it: Clojure is alive and well with excellent defaults for productive and sustainable software development.” This sparked another big thread.

Loading Clojure Libraries Directly From Github

Did you ever fix a bug in an open source library, and then had to wait until the maintainer released an updated version?

It’s happened to me many times, the latest one being Toucan. I had run into a limitation, and found out that there was already an open ticket. It wasn’t a big change so I decided to dive in and address it. Just a little yak shave so I could get on with my life.

Now this pull request needs to be reviewed, and merged, and eventually be released to Clojars, but ain’t nobody got time for that stuff. No sir-ee.

Lambda Island Turns One, The Story of a Rocky Ride

One year ago to date I launched Lambda Island, a service that offers high quality video tutorials on web development with Clojure and ClojureScript. It’s been quite a ride. In this post I want to look back at the past year, provide some insight into how this experience has been for me, and give you a glimpse of what the future has in store.

This story really starts in December 2015. After three years of doing contract work for Ticketsolve I decided it was time for a change. I have been self-employed for many years, but I knew that sooner or later I wanted to try my hand at selling a product, rather than selling my time.

In January and February I took some time for soul-searching, and recharging. I went to speak at RubyConf Australia, and got to hang out with some old friends around Australia and New Zealand. Once back in Berlin I got busy creating Lambda Island.

Writing Node.js scripts with ClojureScript

In the two most recent  Lambda Island episodes I covered in-depth how to create command line utilities based on Lumo, how to combine them with third party libraries, and how to deploy them to npmjs.com.

However there’s a different way to create tools with ClojureScript and distribute them through NPM, without relying on Lumo. In this blog post I want to quickly demostrate how to do just that.

To recap, Lumo is a ClojureScript environment based on Node.js, using bootstrapped (self-hosted) ClojureScript. This means the ClojureScript compiler, which is written in Clojure and runs on the JVM, is used to compile itself to JavaScript. This way the JVM is no longer needed, all you need is a JavaScript runtime to compile and run ClojureScript code, which in this case is provided by Node.js. On top of that Lumo uses nexe, so Lumo can be distributed as a single compact and fast executable binary.

Announcing lambdaisland/uri 1.0.0

I just released lambdaisland/uri, a pure Clojure/ClojureScript URI library. It is available on Github and Clojars.

This is a small piece of the code base that powers lambdaisland.com. It’s inspired by Ruby’s Addressable::URI, the most solid URI implementation I’ve seen to date, although it only offers a small part of the functionality that library offers.

It’s written in pure Clojure/ClojureScript, with only minimal use of .cljc reader conditionals to smooth over differences in regular expression syntax, and differences in core protocols. It does not rely on any URI functionality offered by the host, such as java.net.URL, so it’s usable across all current and future Clojure implementations (Clojure, ClojureScript, ClojureCLR).

re-frame Subscriptions Got Even Better

Up until recently, to use re-frame subscriptions in Reagent views, you had to use a form-2 component.

A form-2 component is a function that returns another function, which does the actual rendering of the component to hiccup. In contrast, a form-1 component renders the hiccup directly.

;; form-1
(defn todo-item [todo]
  [:div.view
   [todo-checkbox (:id todo) (:completed todo)]
   [:label {:unselectable "on"} title]
   [:button.destroy {:on-click #(dispatch [:todos/remove (:id todo)])}]])

;; form-2
(defn todo-item [todo]
  (fn [todo]
    [:div.view
     [todo-checkbox (:id todo) (:completed todo)]
     [:label {:unselectable "on"} title]
     [:button.destroy {:on-click #(dispatch [:todos/remove (:id todo)])}]]))

Game Development with Clojure/ClojureScript

This weekend it’s Ludum Dare again, the world’s longest running game jam. The idea is that, alone or with a team, you build a game in a weekend based on a certain theme.

We got a little team together here in Berlin, and so I’ve been reviewing what options there are for someone wanting to build a game in Clojure or Clojurescript.

The good news is there are plenty of options, as you’ll see from the list below. You can do desktop games, browser based games with canvas or webgl, and you can even create Unity 3D games, all from your comfortable Clojure parentheses.

Union Types with Clojure.Spec

Elm and other statically typed languages have a great feature called Union Types (also called Sum Types or Algebraic Data Types).

Here’s an example taken from Elm. Suppose your system used to represent users as integers, maybe just an auto-incrementing primary key, but then switched to UUIDs represented as strings.

To correctly model this situation, you need a way to create a type that can be either an integer or a string, that’s what union types give you.