Why we started using PostgreSQL with Slick next to MongoDB

Why we started using PostgreSQL with Slick next to MongoDB

FacebookLinkedIn

At Plot we initially used MongoDB for persistent storage. However, we recently moved a part of our data storage from MongoDB to a PostgreSQL database. The only data that is kept in MongoDB is the data collected from mobile devices.

This article starts with a short introduction why we moved from MongoDB only to polyglot storage. After that I explain how Slick can be used to read data from your SQL database.

Why we moved

There were multiple reasons to move a large part of our data to PostgreSQL. Our data model was hard to fit in an aggregate structure. Documents became too large, data had to be copied between documents or multiple queries were required to combine the data. Often data was copied to allow indexes on the data to ensure a decent query performance. Without joins and referential integrity it is hard to ensure the data is kept consistent.

A problem we hit with MongoDB was that only elements smaller than 1 KB could be indexed (see MongoDB Limits and Thresholds for more info). We hit this limit when querying for location alerts that haven’t been seen before for the device. These views were stored in an array within the document. The schema was similar to this:

{
 "message": "Example alert",
 "seenBy": [...]
}

with the query:

{
 "seenBy": { "$ne": "user" }
}

To keep this query performing fast enough, an index on seenBy is essential. However, that field quickly became larger than 1 KB, so an index was no longer possible. When we were hit by this problem, we started migrating that data to PostgreSQL.

PostgreSQL provides joins, so duplicating data was no longer required, while still having indexes on fields. After migrating to PostgreSQL this query became trivial:

SELECT message
FROM messages
WHERE id NOT IN (
  SELECT messageId
  FROM views
  WHERE user = "user"
)

Other advantages we found in PostgreSQL that weren’t in MongoDB were better tooling and a strict schema. For most of the data the schema was constant. The column names were constant and most of the columns are mandatory, so making the schema explicit is in this case even an advantage.
We keep using Mongo for statistics data. The reason for that is that it is easier to shard and there is no schema enforcement. The raw statistics data is more freeform and therefore a schema here would be inconvenient.

What is Slick?

Slick calls itself a database query library for Scala. It is a layer on top of JDBC offering a Scala API to send queries to relational databases.

In Slick you can map a relation (table) from the database to a Scala tuple or a case class. Because of these mappings your queries become strongly typed. Slick converts the Scala types to the right SQL types. Queries can be done with almost the same syntax as doing read operations on collections.

Beside PostgreSQL, the following database management systems are also supported:

  • DB2*
  • Derby/JavaDB
  • H2
  • HSQLDB/HyperSQL
  • Microsoft Access
  • Microsoft SQL Server
  • MySQL
  • Oracle*
  • SQLite

*Requires subscription for production use

How to use Slick?

Before you can fire queries at your database, you have to setup a connection to your database. This is similar to setting up a JDBC connection.

Database.forURL("jdbc:postgresql://localhost/presentation", 
  driver = "org.postgresql.Driver") withSession {
  //session is now implicitly available in thread-local storage
}

This sets a connection up and makes the session available for the current thread. Update queries are automatically committed. When you don’t want this, you can use transactions. When the code within the block executes successfully, a commit is performed.

Database.forURL("jdbc:postgresql://localhost/presentation", 
  driver = "org.postgresql.Driver") withTransaction {
  //session is now implicitly available in thread-local storage
}

Before you can pull data from the tables, you have to make the table definition available to Slick. You can do that by making a Table object. In this object you define the columns, which column is primary key. You also have to define the * method which lists all columns.

object CoffeeTable extends Table[(String, BigDecimal, Int, Int)]("COFFEE") {
  def name = column[String]("NAME", O.PrimaryKey)
  def price = column[BigDecimal]("PRICE")
  def sales = column[Int]("SALES")
  def * = name ~ price ~ sales
}

When this is in place, you can perform a simple SELECT query. The following query only selects all names of coffee that have a price higher or equal to the minimum price of 1.0. The other fields aren’t included in the select query.

val minPrice: BigDecimal = 1.0
val query = for {
 c <- CoffeeTable
 if (c.price >= minPrice)
} yield (c.name)

val names = query.list

The following query selects all data from the coffee table that have a price higher or equal to the minimum price of 1.0. The result is a list of tuples.

val minPrice: BigDecimal = 1.0
val query = for {
 c <- CoffeeTable
 if (c.price >= minPrice)
} yield (c)

val coffees = query.take(5).list

Most of the time it is more convenient to get the data in case classes instead of tuples. The definition to directly map the table to the case class is below:

case class Coffee(name: String, price: BigDecimal, sales: Int)
object CoffeeTable extends Table[Coffee]("coffee") {
  def name = column[String]("NAME", O.PrimaryKey)
  def price = column[BigDecimal]("PRICE")
  def sales = column[Int]("SALES")
  def * = name ~ price ~ sales  <> (Coffee, Coffee.unapply _)
}

To retrieve data from the table, the queries stay the same. However the first query no longer returns a tuple, but a case class.

To insert a new coffee into our table you can call the insert method on the table object. You can directly insert the case classes.

CoffeeTable.insert(
 Coffee("Java", 2.50, 0)
)

To update rows in the table, you first define a query which will select the elements that have to be updated. Then you can call query.update with the new values for the yielded columns (or replacement case class instance when you yielded the complete row).

val query = for {
 c <- CoffeeTable
 if (c.price < 1.50)
} yield (c.price)

query.update(1.50)

Currently it is only possible to replace the values with new values. It isn’t possible to base the value on the current value of the cell or another cell. To delete rows you can use the delete method on the query instance.

query.delete

When you want to perform a query that cannot be expressed with Slick expressions, you can still write the query yourself.

val query = StaticQuery.query[BigDecimal, (String, String)]("""
  select c.name, s.name
  from coffees c
  where c.price < ?
""")

Other features

What I haven’t shown so far are queries with parameters, use of self defined types in queries and “direct embedding”.

When you want to reuse queries you can make use of query templates. Those are queries with parameters. This removed the need to rebuild the query trees every time you want to perform a query. You can read more about it here in the manual.

Slick provides mappings for many commonly used types in Scala. Often it is convenient to make use of your own types. Slick provides means to map your own types to Scala types, so you can directly use them in queries. We use that to map DateTime objects from the Joda datetime library to the sql Timestamp object.

Direct embedding is an experimental feature that allows case classes to be directly bound to tables, instead of having a mapping table object. It uses the new macro functionality from Scala. As said, this feature is still experimental, so the syntax may change in future versions.

Pros and cons

Slick is library that sticks close to SQL. Therefore it stays clear what performance implications queries have and it stays clear how the classes are mapped to the data. It hides DBMS specific syntax and prevents you from writing code that can be exploited with SQL injections. Also because of the design, most of the queries can already be checked during compile time.

Disadvantages would be the embedded DSL that is not always intuitive. When making mistakes in the syntax, the compile errors are often difficult to understand. Also the documentation is on many subjects not comprehensive. The focus on single thread usages makes use in combination with Futures difficult.

A workaround for multithreaded use is not using withSession, but creating sessions yourself and closing them yourself when your request is completed. Keep in mind that sessions are stateful, so don’t use the same session from multiple threads in the same time.

For transactions a more hacky workaround is needed, because some calls methods start a transaction when not in a transaction (for example insertAll). When starting the transaction yourself, you have to tell this to the session object.

val session = Database.forURL("jdbc:...", driver = "...").createSession()
session.conn.setAutoCommit(false)
//prevent Slick from creating transactions itself
session.asInstanceOf[BaseSession].inTransaction = true
…
session.conn.commit()
session.close()

A good resource for Slick is the website
http://slick.typesafe.com/
Another good resource is the Google Group:
https://groups.google.com/forum/?fromgroups#!forum/scalaquery
Although the URI still contains the old project name ScalaQuery, but it is really about Slick now.

FacebookLinkedIn

Comments (17)

  1. [...] The slides of the presentations are available here and here. Also have a look at the blog posts we have written about Gatling and Slick. [...]

  2. Alex - Reply

    16-05-2013 at 11:56

    MongoDB limits are 1kb for keys not for values!

    • Alex - Reply

      16-05-2013 at 12:02

      Sorry, it seems there is a limitation of 1kb for values also – my mistake

      • Mark van der Tol - Reply

        16-05-2013 at 13:41

        This is indeed somewhat confusing. A document has a maximum size of 16MB, however the values of the fields you index on have a maximum size of 1 kB. When the value becomes larger, it isn’t included in the index, which may cause queries run on that index to return incorrect values.

  3. Tomek - Reply

    17-05-2013 at 07:18

    Recently we all migrated to NoSQL and now some of us are going back. Looks like the pendulum has changed the direction :)

    • Keth - Reply

      23-07-2013 at 00:31

      Yeah, because it’s better to use the appropriate tools instead of running after the hypetrain.

  4. Jakub Janecek - Reply

    17-05-2013 at 16:13

    You shouldn’t have used Mongodb in the first place, obviously your requirements didn’t match it. Pros and cons of relational DBs are well-known for a long long time!

    • Dirk Louwers - Reply

      22-05-2013 at 09:27

      Hi Jakub,

      Actually we are still using MongoDB for a different part of the application and are quite happy with it.
      Aruguably you could comment on any blog that describes lessons learned stating that everything should
      have gone perfect from the get go. Unfortunately we didn’t know about the 1kb limit on indexed array values
      and since this is not prominently brought to the front in the MongoDB documentation we thought perhaps we
      could help others by zooming in on the issue. I’m sorry if this was not new to you, but I hope this
      article can still benefit other readers.

      • Jakub Janecek - Reply

        22-05-2013 at 11:48

        Hello Dirk,

        thank you for your reply.

        There is no need to apologize ;) I haven’t said that the article is not helpful. Actually, I don’t even use MongoDB so it was definitely new to me! I guess I misunderstood the intent of the article because what made me comment on it was the first and the last paragraph of “Why we moved” section where you state kind of obvious reasons for switch
        from almost any NoSQL DB back to relational DB. For example, you write that Postgres has “better tooling and a strict schema”. Sorry, but don’t tell me you didn’t know that when you chose MongoDB.

        I didn’t mean to insult you. I just found it a little funny to read such an article after all the articles describing “why NoSQL is better” (not on your blog, generally everywhere). It shows that creating a hype around something just works and developers do not realize the consequences.

  5. [...] Mark van der Tol explained why they (Plot) started using PostgreSQL with Slick instead of MongoDB [...]

  6. Vladimir - Reply

    03-06-2013 at 15:16

    I already see the name of the next article – “Why we started use JDBC instead of slick”

    • Dirk Louwers - Reply

      03-06-2013 at 15:38

      Nah, that would not be nearly controversial enough ;) Slick isn’t that well known yet. But in earnest, Slick offers enough control over the query output to not be in our way. I would much prefer JDBC to be non-blocking though. We do execute our database calls on an executor backed by a c3p0 connection pool, but it does need more tweaking than out asynchronous MongoDB calls based on ReactiveMongo. Multiple JDBC calls on our dispatcher have the tendency to clog up the thread pool preventing quick non-blocking jobs to be executed. That’s why we execute our JDBC calls on a separate dispatcher with an optimized thread pool with a size comparable to that of the connection pool.

  7. Cedric Brancourt - Reply

    02-10-2013 at 17:12

    You could use an indexation solution to workaround and get things done by the right tool.

    Writing to mongo and reading from an elasticsearch or a kind of.
    It is pretty efficient in some cases.

  8. Java Werks - Reply

    08-10-2013 at 23:45

    I recently implemented a set of Scala CRUD unit tests using Jdbc, Sorm and Slick.

    Slick was easily the worst of the three. To say it’s non-intuitive to use, would be a gross understatement. In fact, it may well be the worst database API of all time.

    Sorm was by far the best database API, yet by far the slowest. And that’s sad. It appears to be the brainchild of one guy. And it may have already died. Not sure, though.

    The Jdbc solution, of course, performed blazingly fast as one might expect. But would you want to use it on a large project? No. I doubt you’d want to use Slick either.

    Any Java solution, be it iBatis, HIbernate or JPA ( with implementer ), is vastly superior to any Scala database solution. I didn’t even bother testing the difficult problems that have been already conquered by said Java solutions.

    Coupled with the incredibly poor Json-scala binding solution that ships with Play, I think TypeSafe has a limited lifespan beyond NYC and the west coast, in terms of future funding.

    Typesafe simply doesn’t put forth a competitive offering vis-a-vis the tried and true Java enterprise solutions that exist today. Typesafe should must know this.

    Yet the Scala community is so hell bent on reinventing the wheel that they’re destroying their future in the process. Not to worry, the niche community will survive.

    Forget that Scala, the language, is already a tough enough sale. The Scala ecosystem, to date, is a massive failure in its current state. Will it improve? Maybe. But I doubt it.

    Snatching victory from the jaws of all most certain defeat, I increasingly feel that Java 8 will bury Scala by the end of 2014.

    This critique is coming from someone who has attempted to bring Scala into the enterprise since 2009. In fact, I’ve been trying like hell the past 4 months, but to no avail. It’s a no go. But I’ll hang in there for another few months. Just maybe it’ll turn around.

    Rod Johnson was right in his critique of Scala and its community. He was wrong in his prediction of Scala’s success. The Scala ecosystem is simply not competitive against any other enterprise ecosystem, Java or otherwise. It’s a true outlier. The window of opportunity for Scala is effectively closed, unless a niche market is your definition of success. And I suspect it is for most Scala users.

    There – I said it. I realize this about culture for nearly all Scalanistas. I get it. You are fighting for preservation of your culture, and you are, indeed, winning. Just be careful of what you wish for, my friends.;)

  9. […] The slides of the presentations are available here and here. Also have a look at the blog posts we have written about Gatling and Slick. […]

Add a Comment

sign up
newsletter