ActiveRecord race conditions

Ara Howard has discovered that the ActiveRecord validation mechanism does not ensure data integrity.1 Validations feel a bit like database constraints but it turns out they are really only useful for producing human friendly error messages.

This is because the assertions they define are tested by reading from the database before the changes are written to the database. As you will no doubt recall, phantom reads are not prevented by any isolation mode other than serializable. So unless you are running your database in serializable isolation mode (and you aren’t because nobody does) that means that the use of ActiveRecord validations setup a classic race condition.

On my work project we found this out the hard way. The appearance of multiple records should have been blocked by the validations was a bit surprising. In our case, the impacted models happened to be immutable so we only had to solve this from for the #find_or_create case. We ended up reimplementing #find_or_create so that it does the following:

  1. do a find 2.

    1. if we found a matching record return the model object
    2. if it does not exist create a savepoint
  2. insert the record 4.

    1. if the insert succeeds return the new model object
    2. if the insert failed roll back to the savepoint
  3. re-run the find and return the result

This approach does requires the use of database constraints but, having your data integrity constraints separated from the data model definition has always felt a bit awkward. So I think this more of a feature than a bug.

It would be really nice if this behavior were included by default in ActiveRecord. A similar approach could be used to remove the race conditions in regular creates and saves by simply detecting the insert or update failures and re-executing the validations. This would not even require that the validations/constraints be duplicated. The validations could, in most cases, be generated mechanically from the database constraints. For example, DrySQL already does this.

Such an approach would provide the pretty error messages Rails users expect, neatly combined with the data integrity guarantees that users of modern databases expect.


  1. You simply must love any sample code that has a method Fubar.hork_the_db.

Comments 11

  1. James Bennett wrote:

    Out of curiosity, why not pin down the parts of the application which can expose race conditions, and have them manually set their particular transactions serializable so that the DB can help you out? That would seem to be the more natural thing, and works with the database rather than against it.

    Posted 23 Nov 2007 at 4:13 am
  2. Stephen wrote:

    I am a fan of Ruby on Rails, but this whole ‘no constraints in the database’ is just plain wrong.

    I am a database programmer by trade and the ONLY way to ensure data integrity is to use database constraints. In your solution above, you still may have a problem if you use Rails transactions:

    session 1: find record
    session 2: find record
    (both don’t find a record)
    session 1: create record, but no commit
    session 2: create record, but no commit
    (now both records are in the database)


    session 1: commit some time later
    session 2: commit some time later
    session 2: will fail on commit if there is a unique constraint preventing the data, but not when your inserted the record.

    At least the constraint stops the bad data getting in, but the error raised may be at a different line of Ruby code than you expect.

    The Rails validations (especially the ensure_uniqueness_of) in my mind are purely for nice error messages 99% of the time. As well as race conditions like you discovered, they are inefficient – if you have a heavily inserted table, then Rails has to select to ensure uniqueness, and then insert – if you just put a unique constrain on the table and do the insert, you save on the select. This would only ever be an issue in a very heavily inserted table that has a unique column, which is probably rare!

    Posted 23 Nov 2007 at 8:49 am
  3. jay wrote:

    “I am a fan of Ruby on Rails, but this whole ‘no constraints in the database’ is just plain wrong.”

    I don’t think anyone of any significance has said this. Please correct me if I’m wrong. What I have heard, and have said myself, is that I prefer my business logic in my app instead of in the db. I.e. I prefer ruby code to stored procedures. Every serious rails app I’ve worked on, and know about through friends, have db constraints.

    Posted 23 Nov 2007 at 11:56 am
  4. Peter Williams wrote:

    Jay,

    It may be true that very few people have said something a direct as ‘don’t use database constraints’ it is definately the feeling I get from the rails community. And there is certainly no real support for DB constraints in AR. You cannot even define database constraints without resorting to executing straight SQL in you migrations.

    Posted 23 Nov 2007 at 1:18 pm
  5. Peter Williams wrote:

    James,

    That is an interesting idea. To be perfectly honest it had not occurred to me to change the isolation level just for the requests that needed it. In rails this might be a little difficult because normally the transaction is already started before the action code gets invoked. But I will have to look into it and see if it is possible.

    Posted 23 Nov 2007 at 1:21 pm
  6. Peter Williams wrote:

    Stephen,

    I definitely lean toward putting at least some constraints in the db. It gives you a level of certainty about the data cleanliness that just cannot be achieved in the application code. I have a post on this topic matriculating so hopefully I will just wait for it to talk more about this.

    Posted 23 Nov 2007 at 1:23 pm
  7. James Bennett wrote:

    Peter, it’s been ages since I’ve done anything with Rails (back in the pre-1.0 days, actually), but I’d be surprised if there wasn’t a way to end a “SET TRANSACTION SERIALIZABLE” up-front.

    I don’t know what the exact Ruby idiom would be, but in the Python DB adapters doing this means you’ll get an exception raised from the DB if it detects someone else working on the same data, and you just catch that, abort and try again.

    Posted 23 Nov 2007 at 8:31 pm
  8. tobi wrote:

    Your article is wrong, rails supports this since 1.0.

    Add a column called lock_version and Rails will increment this with every save.

    It will automatically implement optimistic locking for you and will raise an exception when you are trying to save stale data.

    Rails also has row level locking build in. When you find() an row you can pass :lock => true and it will a select for update.

    Posted 23 Nov 2007 at 10:15 pm
  9. Peter Williams wrote:

    Tobi,

    Those two facilities don’t help with this set of issues. These race conditions relate to ending up with two separate records that should be mutually exclusive based on the validations.

    For example, in my case I was inserting two logically identical records simultaneously. Both processes that want to insert this record check for it’s existence, saw that it does not exist and then insert it. No amount of row level locking or row versioning would have helped.

    Further, I think it would quite easy to argue that both of those approaches are pretty kludgey in most situations.

    Posted 24 Nov 2007 at 3:39 pm
  10. Stephen wrote:

    Peter/Jay,

    I totally understand why people would like not to put constraints in the database – it means you have to get your hands dirty with the database, which is something many application developers don’t like. The Rails community may not say ‘constraints are wrong’, but they imply they should make you feel dirty if you use them.

    This sort of race condition problem is something that just cannot be efficiently solved without DB constraints – infact, I think that unless you use a lock instead (which is a horrible unscalable solution) you cannot guarantee the uniqueness of your data without DB constraints.

    Serializable transactions don’t solve enforcing the uniqueness of a column unless there is a unique constraint involved too.

    Posted 25 Nov 2007 at 12:03 pm
  11. Peter Williams wrote:

    Stephen,

    Serializable transactions would help in this case because a phantom read (the initial existence check did not return a record at the time it was run that it would return at the end of the transaction) would have occurred in all the transactions except the first one to commit. This means that all the transactions except the first one to commit would fail in strict serializable isolation mode because a phantom read is not allowed in that mode.

    Posted 25 Nov 2007 at 3:33 pm