Monday, May 11, 2015

What the tag line from Highlander can teach us about Exlusion Constraints

If you are trying to get your head wrapped around exclusion constraints via a Google search you may get the impression that the only thing it's good for is solving the double booking problem common to reservation systems. The PostgreSQL documentation expands the possibility a teeny tiny bit by mentioning preventing overlapping circles. But that's it! So here is this truly phenomenal feature that thanks to an absence of varied examples you may not think you have a use for.

I'm here to expand the possibilities by introducing what I call the the Highlander use case. But before I do I want to explain why all the examples you find on Google are about the double booking problem. The fact is prior to the existence of exclusion constraints the double booking problem was a really really really really hard problem to solve in the relational database world. And if you managed to solve it, chances are you really didn't, and it was slow at scale because of all the locking required. So when Jeff Davis brought exclusion constraints to PostgreSQL all the write-ups naturally focused on the fact that there was finally a safe and scalable solution to the double booking problem.

What are exclusion constraints? Before we answer that let's start by explaining what a constraint, in general, is. A constraint is a rule expressed as statements that evaluate to a Boolean value (i.e., true or false). In essence a constraint is simply a Boolean expression. Constraints are normally checked when you try to INSERT a new row into a table or try to update an existing row. If the expression evaluates to true the INSERT or UPDATE is allowed to proceed, otherwise it fails and an error is raised.

The 4 constraints that just about everyone who has created a relational database is familiar with is:
  1. NOT NULL - A column cannot accept NULL as a value.
  2. UNIQUE - A column or combination of columns cannot have duplicates.
  3. PRIMARY KEY - Combination of #1 & #2.
  4. FOREIGN KEY - A column (or columns) in Table1 must have a matching entry in Table2.
An exclusion constraint is just a supped up unique constraint in that it has a lot more expressive power to define exactly what unique means. Now on w/ the use case.

Let's say you sell a system where owners create events and their customers attend the events. One of the features you probably want in this system is the ability for customers to provide feedback on their experience. So basically you want surveys.

Lets assume that there are more than one type of event, therefore you may want to have more than one survey and would like to associate specific surveys w/ specific events. Let's visualize that a tiny bit:
Event_1 -> Survey_1
Event_2 -> Survey_2
Event_N -> Default Survey.

Note Event_N and Default Survey. Event_N is any event that has no explicit survey associated w/ it. And when the system sees one of these it needs to send the default survey to the customers that attended those events. So that means we need a way to mark a survey as the default survey. (Hold on to your hat folks the Highlander reference is almost here!) And since, logically speaking, "there can be only one" default survey it would be great if there was a way to express that at the database level because we can't depend on the application nor people not to violate the rule that there should be only one default survey.

So let's quickly summarize what the Highlander use case is by comparing it to the standard unique constraint. With a standard unique constraint there can be no duplicates, period. But in the Highlander use case, you take one value and make it special by saying you can duplicate anything else, but not this; for this, "there can be only one"!

Exclusion constraints makes addressing the Highlander use case trivial. Let's look at some SQL.
CREATE TABLE surveys (
  id SERIAL PRIMARY KEY,
  name VARCHAR(64) NOT NULL UNIQUE,
  survey_questions JSONB NOT NULL,
  is_default BOOLEAN NOT NULL DEFAULT FALSE

The is_default column solves the identity problem. Next we'll finish the SQL off w/ our exclusion constraint:
...,
  EXCLUDE (is_default WITH =) WHERE (is_default)
);
EXCLUDE (is_default WITH =) is equivalent to having a unique constraint on is_default. But that's not enough, for the same reason that a unique constraint doesn't solve the problem. So we need more expressive power. Exclusion constraints gives us that expressive power by allowing us to specify multiple comma separated criteria in the EXCLUDE block and we can specialize even more by adding a WHERE clause. So that's what we've done here. In layman's terms our exclusion constraint says, "the value of the is_default column of the row being inserted or updated must not equal any other row's is_default column value, but only when is_default IS TRUE". This means we can have a gazillion rows where is_default IS FALSE but only one row where is_default IS TRUE.

If you have a good grasp of SQL's syntax in general and PostgreSQL in particular, then the exclusion constraint syntax should be obvious. But if you are thrown by the double use of is_default let me explain.

The first one is a reference to a column's name. In this case the column that we want to reference is named "is_default". The second one needs to be understood in the context of how WHERE clauses work. All WHERE clauses must be Boolean expressions, and since is_default is defined as a Boolean it's by definition a Boolean expression too. So we take advantage of that fact and make it the entirety of the WHERE clause. But if we wanted to be verbose we could have written the WHERE clause as:
WHERE (is_default IS TRUE)
or as
WHERE (is_default=true)

[UPDATE:]Thanks to an anonymous commenter I've learned that PostgreSQL is even cooler than I gave it credit for because the Highlander use case is better solved w/ a pgified unique constraint. Basically, PostgreSQL allows us to apply a WHERE clause to your basic unique constraint, giving us the same expressive power that I was using an exclusion constraint for, but w/ better performance. The lesson here is simple, if you are coming to PostgreSQL from another database or have been living the lie of using only spec compliant syntax, read the PostgreSQL docs carefully because chances are PostgreSQL improves upon the standard in extremely powerful ways (a.k.a, pgified).

For completeness let's include the new version of the DDL:
CREATE TABLE surveys (
  id SERIAL PRIMARY KEY,
  name VARCHAR(64) NOT NULL UNIQUE,
  survey_questions JSONB NOT NULL,
  is_default BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE UNIQUE INDEX ON surveys (is_default) WHERE (is_default);