Blog

Why We Started Using PostgreSQL with Slick Next to MongoDB

At Plot Projects we initially used MongoDB for persistent storage. However, we recently moved a part of our data storage from MongoDB to a PostgreSQL database. The only data that is kept in MongoDB is the data collected from mobile devices.

This article starts with a short introduction why we moved from MongoDB only to polyglot storage. After that I explain how Slick can be used to read data from your SQL database.


Don’t want to think about this? Get started with geofencing without a hassle!

Click Me


Why we moved

There were multiple reasons to move a large part of our data to PostgreSQL. Our data model was hard to fit in an aggregate structure. Documents became too large, data had to be copied between documents or multiple queries were required to combine the data. Often data was copied to allow indexes on the data to ensure a decent query performance. Without joins and referential integrity it is hard to ensure the data is kept consistent.

A problem we hit with MongoDB was that only elements smaller than 1 KB could be indexed (see MongoDB Limits and Thresholds for more info). We hit this limit when querying for location alerts that haven’t been seen before for the device. These views were stored in an array within the document. The schema was similar to this:

{
 "message": "Example alert",
 "seenBy": [...]
}

with the query:

{
 "seenBy": { "$ne": "user" }
}

To keep this query performing fast enough, an index on seenBy is essential. However, that field quickly became larger than 1 KB, so an index was no longer possible. When we were hit by this problem, we started migrating that data to PostgreSQL.

PostgreSQL provides joins, so duplicating data was no longer required, while still having indexes on fields. After migrating to PostgreSQL this query became trivial:

SELECT message
FROM messages
WHERE id NOT IN (
  SELECT messageId
  FROM views
  WHERE user = "user"
)

Other advantages we found in PostgreSQL that weren’t in MongoDB were better tooling and a strict schema. For most of the data the schema was constant. The column names were constant and most of the columns are mandatory, so making the schema explicit is in this case even an advantage.
We keep using Mongo for statistics data. The reason for that is that it is easier to shard and there is no schema enforcement. The raw statistics data is more freeform and therefore a schema here would be inconvenient.

What is Slick?

Slick calls itself a database query library for Scala. It is a layer on top of JDBC offering a Scala API to send queries to relational databases.

In Slick you can map a relation (table) from the database to a Scala tuple or a case class. Because of these mappings your queries become strongly typed. Slick converts the Scala types to the right SQL types. Queries can be done with almost the same syntax as doing read operations on collections.

Beside PostgreSQL, the following database management systems are also supported:

  • DB2*
  • Derby/JavaDB
  • H2
  • HSQLDB/HyperSQL
  • Microsoft Access
  • Microsoft SQL Server
  • MySQL
  • Oracle*
  • SQLite

*Requires subscription for production use

How to use Slick?

Before you can fire queries at your database, you have to setup a connection to your database. This is similar to setting up a JDBC connection.

Database.forURL("jdbc:postgresql://localhost/presentation",
  driver = "org.postgresql.Driver") withSession {
  //session is now implicitly available in thread-local storage
}

This sets a connection up and makes the session available for the current thread. Update queries are automatically committed. When you don’t want this, you can use transactions. When the code within the block executes successfully, a commit is performed.

Database.forURL("jdbc:postgresql://localhost/presentation",
  driver = "org.postgresql.Driver") withTransaction {
  //session is now implicitly available in thread-local storage
}

Before you can pull data from the tables, you have to make the table definition available to Slick. You can do that by making a Table object. In this object you define the columns, which column is primary key. You also have to define the * method which lists all columns.

object CoffeeTable extends Table[(String, BigDecimal, Int, Int)]("COFFEE") {
  def name = column[String]("NAME", O.PrimaryKey)
  def price = column[BigDecimal]("PRICE")
  def sales = column[Int]("SALES")
  def * = name ~ price ~ sales
}

When this is in place, you can perform a simple SELECT query. The following query only selects all names of coffee that have a price higher or equal to the minimum price of 1.0. The other fields aren’t included in the select query.

val minPrice: BigDecimal = 1.0
val query = for {
 c = minPrice)
} yield (c.name)

val names = query.list

The following query selects all data from the coffee table that have a price higher or equal to the minimum price of 1.0. The result is a list of tuples.

val minPrice: BigDecimal = 1.0
val query = for {
 c = minPrice)
} yield (c)

val coffees = query.take(5).list

Most of the time it is more convenient to get the data in case classes instead of tuples. The definition to directly map the table to the case class is below:

case class Coffee(name: String, price: BigDecimal, sales: Int)
object CoffeeTable extends Table[Coffee]("coffee") {
  def name = column[String]("NAME", O.PrimaryKey)
  def price = column[BigDecimal]("PRICE")
  def sales = column[Int]("SALES")
  def * = name ~ price ~ sales  <> (Coffee, Coffee.unapply _)
}

To retrieve data from the table, the queries stay the same. However the first query no longer returns a tuple, but a case class.

To insert a new coffee into our table you can call the insert method on the table object. You can directly insert the case classes.

CoffeeTable.insert(
 Coffee("Java", 2.50, 0)
)

To update rows in the table, you first define a query which will select the elements that have to be updated. Then you can call query.update with the new values for the yielded columns (or replacement case class instance when you yielded the complete row).

val query = for {
 c

Currently it is only possible to replace the values with new values. It isn’t possible to base the value on the current value of the cell or another cell. To delete rows you can use the delete method on the query instance.

query.delete

When you want to perform a query that cannot be expressed with Slick expressions, you can still write the query yourself.

val query = StaticQuery.query[BigDecimal, (String, String)]("""
  select c.name, s.name
  from coffees c
  where c.price < ?
""")

Other features

What I haven’t shown so far are queries with parameters, use of self defined types in queries and “direct embedding”.

When you want to reuse queries you can make use of query templates. Those are queries with parameters. This removed the need to rebuild the query trees every time you want to perform a query. You can read more about it here in the manual.

Slick provides mappings for many commonly used types in Scala. Often it is convenient to make use of your own types. Slick provides means to map your own types to Scala types, so you can directly use them in queries. We use that to map DateTime objects from the Joda datetime library to the sql Timestamp object.

Direct embedding is an experimental feature that allows case classes to be directly bound to tables, instead of having a mapping table object. It uses the new macro functionality from Scala. As said, this feature is still experimental, so the syntax may change in future versions.

Pros and cons

Slick is library that sticks close to SQL. Therefore it stays clear what performance implications queries have and it stays clear how the classes are mapped to the data. It hides DBMS specific syntax and prevents you from writing code that can be exploited with SQL injections. Also because of the design, most of the queries can already be checked during compile time.

Disadvantages would be the embedded DSL that is not always intuitive. When making mistakes in the syntax, the compile errors are often difficult to understand. Also the documentation is on many subjects not comprehensive. The focus on single thread usages makes use in combination with Futures difficult.

A workaround for multithreaded use is not using withSession, but creating sessions yourself and closing them yourself when your request is completed. Keep in mind that sessions are stateful, so don’t use the same session from multiple threads in the same time.

For transactions a more hacky workaround is needed, because some calls methods start a transaction when not in a transaction (for example insertAll). When starting the transaction yourself, you have to tell this to the session object.

val session = Database.forURL("jdbc:...", driver = "...").createSession()
session.conn.setAutoCommit(false)
//prevent Slick from creating transactions itself
session.asInstanceOf[BaseSession].inTransaction = true
…
session.conn.commit()
session.close()

A good resource for Slick is the website
http://slick.typesafe.com/
Another good resource is the Google Group:
https://groups.google.com/forum/?fromgroups#!forum/scalaquery
Although the URI still contains the old project name ScalaQuery, but it is really about Slick now.


Don’t want to think about this? Get started with geofencing without a hassle!

Click Me

Spread the love