December 28, 2015

Clojure SQL library showdown

I heard you like DSLs

When it comes to database access, I’ve become less particular over the years as I’ve gotten more and more comfortable with SQL. These days I’m happier to write the SQL to get just the thing I want, than wrestle with an ORM to produce the same thing.

Nevertheless, there’s something to be said for abstractions over SQL, and they seem to have lasting popularity. In this post, I’ll compare a few popular SQL DSLs for Clojure, and contrast them all with their mutual pappy, clojure.java.jdbc.

The contenders are:

These are all the libraries I could find that seemed to be maintained. Not included in this list is Hyperion, a project attempting to be a generic datastore abstraction. It appears to be unmaintained, which is a damn shame, because Hyperion is the most interesting of the lot.

Two other projects that didn’t make it are Yesql and HugSQL. Both of these look interesting, but their raison d’etre is to let you put SQL in SQL files and load them in your code. I encourage you to check these out because this is a fine way to go about things, but for the purposes of today’s competition I’ll have to disqualify them. It’s just too easy to make full-featured SQL queries when you’re writing SQL.

Introducing the challengers!

I should note that I’ve never used any of these, just java.jdbc, so this is all first impressions.

As a basis for comparison, I’ll show you what the following query looks like in each of them:

SELECT *
FROM users
WHERE active = TRUE
ORDER BY created
LIMIT 5
OFFSET 3

Java.JDBC

This is, in essence, the base library for database interactions in Clojure, but it exposes a quite-functional API nonetheless. It’s included in this competition to highlight the bottom line; any library that requires something worse than JDBC isn’t really adding much value, is it?

(clojure.java.jdbc/query
  ["SELECT * FROM users WHERE active = TRUE ORDER BY created LIMIT 5 OFFSET 3"])

Korma

Korma is perhaps the oldest, and certainly most popular SQL DSL for Clojure. It uses a macro-based DSL to represent queries as if they were Clojure forms:

(defentity users)

(select users
  (where {:active true})
  (order :created)
  (limit 5)
  (offset 3))

It has a pretty comprehensive documentation site, which is a major bonus.

Unlike the next two options, Korma is designed to replace java.jdbc for your database access needs. It provides a comprehensive API for declaring database relationships and properties, and you’ll probably never need to drop into SQL unless you want to use some database-specific features. That said, you can still use the sql-only macro to retrieve an SQL vector from Korma to do with as you please.

HoneySQL

HoneySQL exists to represents queries as a map, which you can then format into an SQL string for use with data.jdbc:

(h/format {:select [:*]
           :from [:users]
           :where [:= :active true]
           :order-by [:created]
           :limit 5
           :offset 3})

It also provides helper functions for building said map. Each function works like an assoc, so you can pipe them together, which ends up looking very korma-y

(-> (select :*)
    (from :users)
    (where := :active true)
    (order-by :created)
    (limit 5)
    (offset 3)
    (h/format))

Both of those produce a vector containing an SQL string and any bound parameters, which you can plug (in this case) directly into java.jdbc’s query function to get a result back.

SQLingvo

SQLLingvo works a lot like HoneySQL, but defines a slightly different-looking DSL and eschews the intermediary maps (actually, it does use maps if you try executing the functions alone, but they’re a lot less readable than HoneySQL’s). Instead, the base query functions (select etc.) can contain forms modifying the query, and it applies the transformations internally rather than externally as in honey.

(select db [*]
        (from :users)
        (where '(= :active true))
        (order-by :created)
        (limit 5)
        (offset 3))

Here, the db argument is defined by a call to sqlingvo.db.postgresql, which you might expect means a query is executed by the above. This is not the case! The db argument is apparently there to tell sqlingvo how to quote the query, which could be a nice feature if you use a particularly uptight database. The function above produces an SQL vector for use with query just like both honeysql examples.

oj

OJ is basically just like HoneySQL, representing queries as Clojure maps. The biggest difference is that oj provides a facility (exec) for executing the maps directly, rather than rendering them as strings and passing them to jdbc:

(oj/exec {:table :users
          :where {:active true}
          :order-by :created
          :limit 5
          :offset 3} db-spec)

Like honeysql, OJ also provides utilities for not using a map, too. However, they’re so basic that reading their implementation feels almost sarcastic. I tried to render the above example, but offset is not present (and if it was, it would just be #(assoc %1 :offset %2)). OJ is really about using maps.

Suricatta

As usual, I found funcool’s library off in the corner doing its own thing. Their libraries are usually pretty good stuff though, so let’s not leave this one out.

Suricatta is a combination jdbc helper (providing some nice refinements over jdbc’s api) and Korma-esque query DSL. Here’s what the latter looks like:

(-> (d/select :email)
    (d/from :users)
    (d/where ["active = ?" true])
    (d/order-by :created)
    (d/limit 5)
    (d/offset 3)
    (sqlvec))
; => ["select email from users where (active = ?) order by created asc limit ? offset ?" true 5 3]

This should all be starting to feel very familiar by now.

Aggregate

Now for something less familiar. Where most of the above focus on efficiently querying select data, aggregate provides a very simple API for a very complex task: loading and saving related data. It’s not really fair to call it an SQL DSL, because it’s quite different.

Aggregate has only the most spartan facility for loading data, so it can’t really do the above. It’s hard to explain, but it will become clear with examples.

Alright, so let’s see how they all stack up. The rest of this post will be fairly rosetta-code-esque, with some discussion at the end.

Configuring

Here’s the data we’ll be testing with. It’s pretty minimal:

+-----------+         +-------+
| employees |         | teams |
|-----------|         |-------|
| team_id   | ------> | id    |
| id        |         | name  |
| username  |         +-------+
| active    |
| created   |
+-----------+

Let’s see how to set all these libraries up. Most of the can use the same db-spec as java.jdbc, so we’ll open with that and then see the rest.


;; core.clj

(ns databases.core
 (:require [environ.core :refer [env]]))

(defonce db-spec {:subprotocol "postgresql"
              :subname (str "//127.0.0.1:5432/" (:database-name env))
              :classname "org.postgresql.Driver"
              :user (:database-user env)
              :password (:database-password env) })


;; korma.clj

(ns databases.korma
  (:require
    [korma.db :as kdb]
    [korma.core :refer :all]
    [databases.core :refer [db-spec]]))

; Sets up the database that will be used for subsequent queries!
(kdb/defdb db (kdb/postgres db-spec))

; Set up our testing entities
(defentity teams
  (has-many employees {:fk :team_id}))

(defentity employees
  (belongs-to teams {:fk :team_id}))


;; honeysql.clj

(ns databases.honeysql
  (:require [databases.core :refer [db-spec]]
            [clojure.java.jdbc :as j]
            [honeysql.core :as h]
            [honeysql.helpers :refer :all]))


;; sqlingvo.clj

(ns databases.sqlingvo
  (:refer-clojure :exclude [group-by distinct update])
  (:require
    [databases.core :refer [db-spec]]
    [clojure.java.jdbc :as j]
    [sqlingvo.db :as sdb]
    [sqlingvo.core :refer :all]))

(def db (sdb/postgresql db-spec))


;; oj.clj

(ns databases.oj
  (:require [databases.core :refer [db-spec]]
            [oj.core :as oj]
            [oj.modifiers :refer [query where order limit offset]]))


;; suricatta.clj

(ns databases.suricatta
  (:require [databases.core :refer [db-spec]]
            [jdbc.core :as j]
            [suricatta.core :as sc]
            [suricatta.dsl :as d]
            [suricatta.format :refer [get-sql sqlvec]]))


;; aggregate.clj

(ns databases.aggregate
  (:require [aggregate.core :as agg]
            [clojure.java.jdbc :as j]
            [databases.core :refer [db-spec]]))


(def er ; entity relation
  (agg/make-er-config
    (agg/entity :teams
                (agg/->n :employees :employees {:fk-kw :team_id}))
    (agg/entity :employees
                (agg/->1 :team :teams {:owned? false}))))

Create and Drop tables

Of these options, SQLingvo is the only one with reasonable facilities. Most don’t attempt it at all, and Suricatta’s implementation only allows you to provide the column name, type, and nullability; if I can’t add defaults or set up a primary key, I don’t want to be using it.

Here’s how SQLingvo does it:

(j/execute! db-spec
  (sql (create-table db :teams
                (column :id :serial :primary-key? true)
                (column :name :text :not-null? true))))

(j/execute! db-spec
  (sql (create-table db :employees
                    (column :id :serial :primary-key? true)
                    (column :team_id :int) ; No foreign key function
                    (column :username :varchar :length 40 :not-null? true)
                    (column :active :boolean :not-null? true :default false)
                    (column :created :timestamp-with-time-zone :not-null? true :default '(now)))))

(j/execute! db-spec (sql (drop-table db [:employees])))
(j/execute! db-spec (sql (drop-table db [:teams])))

Not bad!

Winner: SQLingvo. That said, you should be creating tables with a migration library anyhow.

Speaking of migration libraries, your options there are almost as varied; it took no time at all to root out ragtime, joplin, drift, migratus, and lobos, all of which seem to be in a reasonable state of repair. Perhaps I’ll do a follow-up on those if anyone’s interested.

I’ve opted to test this by inserting two dependent records. This makes the test a lot more interesting, and also probably more like what you would actually see in production.

;; core.clj
(j/with-db-transaction
  [conn db-spec]
  (let [[team] (j/insert! conn :teams
                          {:name "Team JDBC"})]
    (j/insert! conn :employees
                {:username "jdbcemployee"
                :team_id (:id team)
                :active true})))


;; korma.clj
(transaction
  (let [team (insert teams
                     (values {:name "Team Korma"}))]

    (insert employees
            (values {:username "kormauser"
                    :team_id (:id team)
                    :active true}))))

;; aggregate.clj
(agg/save! er db-spec :employees
           {:username "aggregateuser"
            :active true
            :team {:name "Team Aggregate"}})

For the case of insertion, oj, honeySQL, sqlingvo, and suricatta lack support for returning the result, which means an extra query and more room for error. All of those also require a library like java.jdbc to execute their queries anyhow, so you’d be better off just doing that.

It is no stretch to say that this is exactly what aggregate was designed to do, and it does it well. Note that it creates a new team, since no :id was provided in the nested team declaration.

Winner: Aggregate. Korma and jdbc do a good job, though.

Get all employees for a team

Instead of a boring old query, we’ll do one with a join – or at least, one where a join makes the most sense. Let’s see how the contenders do.


;;jdbc.clj

(j/query db-spec ["SELECT e.* FROM employees e
                   INNER JOIN teams t ON t.id=e.team_id
                   WHERE t.name=?" "Team JDBC"])


;;korma.clj

(select teams
        (with employees)
        (where {:name "Team Korma"}))


;; honeysql.clj

(j/query db-spec (h/format {:select [:*]
                            :from [:employees]
                            :join [:teams [:= :teams.id :employees.team_id]]
                            :where [:= :teams.name "Team HoneySQL"]
                            }))


;;sqlingvo.clj

(j/query db-spec (sql (select db [:*]
             (from :employees)
             (join :teams [:on '(= :teams.id :employees.team_id)])
             (where '(= :teams.name "Team SQLingvo")))))


;;oj.clj

(let [[{id :id}] (oj/exec {:table :teams :select [:id] :where {:name "Team OJ"} :limit 1}
                          db-spec)]
  (oj/exec {:table :employees
            :select [:*]
            :where {:team_id id}} db-spec))


;;suricatta.clj (j is for clojure.jdbc)

(j/fetch db-spec (-> (d/select :*)
    (d/from :employees :teams)
    (d/where (d/and ["employees.team_id=teams.id"]
                    ["teams.name=?" "Team Suricatta"]))
    (sqlvec)))


;;aggregate.clj

(let [[{team-id :id}] (j/query db-spec
                               ["SELECT id FROM teams WHERE name=? LIMIT 1" "Team Aggregate"])]
  (agg/load er db-spec :teams team-id))

Here’s where a few of the less-known contenders start to fall down. OJ gets bottom grades here, since it doesn’t support joins at all. Suricatta comes in second-to-last – it comes with a join method, ostensibly, but that just threw errors. It was saved by its multi-table select ability, and Postgres’ support for implicit joining via the where clause.

SQLingvo has no documentation about its join, and it took some substantial source-code reading and trial-and-error to formulate a working join, but here it is. HoneySQL’s join looks similar, except that they managed to document it – extra points to them.

Aggregate is aware of its limitations, and suggests using a separate query to look up the id before loading the whole mess with load. A bit of extra work, but it does a great job, returning the team record with a nested list of employees to boot.

Korma also returns a nested employee list, but it does so with much less coaxing. Top marks to Korma on this test.

Winner: Korma, no doubt.

Update a record using an SQL function

To make our update a bit more interesting, let’s see if we can make these libraries update an employee’s username to append a tag.

;; jdbc
(j/execute! db-spec
            ["UPDATE employees SET username=concat(username, ?)
              WHERE username=?"
              "-jdbc" "jdbcemployee"])


;;korma.clj

(update employees
        (set-fields {:username (sqlfn "concat" :username "-korma" )})
        (where {:username "kormauser"}))


;;honeysql.clj

(j/execute! db-spec
  (h/format {:update :employees
             :set {:username (h/call "concat" :username "-honey" )}
             :where [:= :username "honeysqluser"]}))


;;sqlingvo.clj

(j/execute! db-spec
  (sql (update db :employees
          {:username '(concat :username "-sqlingvo")}
          (where '(= :username "sqlingvoguy")))))

Suricatta and OJ lack any (documented) support for calling SQL functions, and Aggregate is nowhere close, so this one is a four-way race.

In this case, I think each does a fine job. SQLingvo’s, again, took some trial and error, but I think it came out most succinct (although the conflation of what a quoted list means is a bit troubling here). I’m going to call it a tie between Korma, HoneySQL, and SQLingvo for this one.

Discussion

That was a pretty short test, but I assure you it was still a pain in the ass, so I’m going to stop here. Let’s review how I feel about each of these after a day of dicking around with them:

java.jdbc held its own the whole time. There’s nothing wrong with using just-plain-jdbc for your needs, especially given the ease of dropping into just-plain-sql in those cases where you need to.

Korma has a bit of magic going on (setting up a persistent connection pool in a call to defdb feels like something I didn’t ask for) but it definitely gave me the least trouble in all of these tests, and did a lot of heavy lifting when it came to relationships. If you’re looking for something like an ORM, this is probably the closest you’ll get in Clojure right now. I will definitely consider it for future projects.

HoneySQL is well-done, and I really like the map-to-sql way of working. If I could have contrived a test which really required composable query chunks, I expect HoneySQL would have come out on top.

SQLingvo seems to have a few more features than HoneySQL, but they definitely compete in the same space. You pay for those extra features by losing documentation, which was annoying in places, and I spent more time than I usually like to reading the source. Still, a perfectly servicable SQL DSL. Use this if your application has a need to dynamically create or drop tables.

Suricatta is servicable but incomplete, dropping out of two of my tests and barely passing a third. It’s got potential, but unless you have a big reason to use JOOQ, you might try something else. That said, beyond the DSL it has some strong support for atomic transactions and lazy fetching, so you might want to check it out if your use-case requires that.

OJ is.. ok, I guess. It missed a few tests and generally didn’t stand out from the crowd. Also, having the db-spec as the second argument to oj/exec bothered me more than I think it should have. Use this if writing database-interface functions in a single -> is the most important thing for your application.

Aggregate was a non-competitor in most of these, but by gum, it found the one thing it does and did it perfectly. Include aggregate alongside honeysql or lingvo if you need to dump and load complex nested data into your SQL database.

Conclusion

Use HoneySQL or SQLingvo if you want a little magic, Korma if you want a lot, or just plain java.jdbc if you just want to write some queries and move on dammit.