Structural Data in Rails

On a recent project I ran into a situation where I needed some structural data. I was writing a conference registration application. Each track at the conference costs a different amount and attendees can sign up for more than one track. We already have an accounting infrastructure that has a concept of a “product”, which is just something for which we charge money. So to support registering attendees for tracks, taking their money and letting the bean counters know that we have taken their money I needed to add some new products representing the registration for each track. However these products are far from mere cruft need by the accounting infrastructure, the conference registration code is completely dependent on the existence of these products. There are a set of check boxes on the registration page that the controller maps into the appropriate products from the products table. That means that the conference registration code will simply not work with out those products.

These products are the sort of data I am referring too when I say structural data. In my mind any data whose absence would cause a failure or that is managed as part of the development process, rather than in the application runtime, is structural data. This sort of data (dare I say “pattern”?) occurs fairly frequently, in my experience, and can be used to great effect

Given the fact that the code is tightly coupled to structural data it makes sense to manage structural data in the same way you manage the database schema. If you are using Rails odds are you are managing your schema with migration( and if you aren’t you should be). Migration is a great way to manage a rapidly changing database schema, and it easily supports creation and modification of structural data. Using migrations in this way has several benefits. It keeps versions of the structural data associated with the versions of the code with which they were intended to work (just check out the project from Subversion and you ready to go). It ensures that those products get inserted when the conference registration code gets deployed (migrating the database is part of the deployment process). And finally it places those vital database records under revision control.

The Problem

This technique works very well with the exception of testing. Unfortunately, the way Rails handles the test data means that you are forced to repeat any structural data in both the migrations and in the fixtures. When a test is started in Rails it purges the test database, then it recreates it by copying the schema of the development databaseThe details of how exactly this schema
copying happens vary depending on the schema format you are using and
whether you are using migrations but the end result is the same. No
matter what you end up with and exact duplicate of your development
databases schema.
. This approach is not completely unreasonable, your test database always has an identical structure as your development database, however I see several problems with it.

Cloning the development database assumes that the development database is up-to-date. Most of the time development databases are up to date but if you checked out and forgot to do a rake migrate your development database could quite easily be out of date. If this happens you are going to see test failures and the reason is not going to be immediately obvious (I can hear it now, “but it works fine in dev…”).

Cloning assumes that the development database is the authoritative version of the schema. In my world it is not. The migrations are the authoritative version of the schema. When I go to production I am going to do it by executing the migrations not by dumping my development database schema.

The behavior to clone the database is duplicative. We already have a perfectly good way to create the needed database schema. Namely, the migrations that are going to run in production to product the schema against which this code will run. Why have more code to achieve that same result of building a schema?

Cloning the development database assumes that the structure is all that is important. This completely ignores the structural data which is just as important as the physical structure of the database. To work around this you have to duplicate this structural data in both the migrations and the test fixtures. And I despise repeating myself when I am programming…

The solution: schema_format = :migration

I finally got around to creating a solution (you can download the plugin here) that avoids all these problems. This plugin introduces a new schema format of :migration. This schema format treats the migrations you have put so much time and effort into creating as the final authority for what belongs in a database for your application. With this plugin installed and enabled tests will start by purging the existing test database and then running the migrations, in order, from 001 to the most recent migration. This guarantees that the tests will be run against the most recent schema that could possibly make it’s way to production.

This solves the first two issues I raised above. We will ignore the third issue, duplicative code, because the existing code must remain for compatibility reasons and it does not directly impact us, anyway. The fourth issue, structural data, is handled by the plugin also. At first blush it might appear that the behavior I described above would be sufficient to solve this issue also but it is not. This issue remains because the Fixtures code in ActiveRecord actively deletes all rows from a table before loading the fixture data into that table.

Purging a table before loading the fixture data helps isolate tests from one another by ensuring that a test will never get data that has been modified by a previous test. With transactional fixtures this is less of an issue but even with transactional fixture there are situation where modified data will not be removed at the end of a testFor example, if a test fails the data it
modified/created will not be removed. This allows for more easy
debugging of failed tests because state of the database is exactly as
they left them. On the other hand this is only really useful for the
very last test that fails.
. Unfortunately, we usually want fixture data even for tables that contain structural data. This is because the structural data we currently have may not fully exercise the functionality of the associated model class. Having fixture data means that at some point those tables will get wiped.

To avoid this problem the migration schema format plugin includes functionality to protect records in the database that are not fixture data. This is achieved by changing the table purging behavior of fixture loading. Rather than purging the entire table the fixture loading code only deletes the record that has the same primary key as the fixture it is currently loading. This means that your fixture data and structural data can live in peace and harmony. The only constraint is that fixture data must never use a primary key that is also used by a piece of real structural data. That constraint is easy to deal with simply by using large values for the primary key in fixture data that needs to play nice with structural data.

Enabling the migration schema format is easy:

1. download the tar and unpack it into your vendor/plugins directory 2. edit your config/environments.rb to include the line ”config.active_record.schema_format = :migration ” within the Rails::Initializer.run do |config| block 3. add the line ”require 'shield_nonfixture_data'” to test/test_helper.rb immediately after require 'test_help'

And voila, you can test using migrations and structural data.

10 thoughts on “Structural Data in Rails

  1. Yup. Having vital structural data (or application data as I call it) not managed by SCM is deadly. Migrations are one approach that definitely works quite well for a smaller amount of data. Another approach we have taken at PLANET ARGON is to keep a separate directory of fixtures containing information for structural data, along with a Rake task to manage that structural data.

    For the Globalize plugin (http://globalize-rails.org/), we had a large dataset (read: nearly 4000 records) managed by a migration. I found this data much easier to manage after making a custom set of Rake tasks and separating the data into its own directory. Rake is good.

  2. Interesting approach… I can see this that the migration approach might become an issue with large amounts of data.

    Did you have any purely test data for the tables that contained this large set of structural/application data? Or did you rely on the real data being sufficient varied to support effective testing?

  3. Does this work against Ruby on Rails 1.1.2? I just tried
    it against my application and it doesn’t work:

    $ psql -U spi-ed-test spi-ed-test 0 AND NOT a.attisdropped
    ORDER BY a.attnum
    from

    It doesn’t appear to be running the migrations.

    They redid much of the rake in the 1.0 to 1.1 transition, so maybe
    this plugin only works with 1.0?

    Also, to clear the database, it would be nice to use the reverse
    migration by passing nil to the migrate function:

    ActiveRecord::Migrator.migrate(“db/migrate/”, nil)

    Also, consider putting this plugin in the wiki at

    http://wiki.rubyonrails.org/rails/pages/Plugins

    so more people can find it. It was only through Googling that I
    refound this blog.

    Regards,
    Blair

  4. Peter,
    Thanks for the code! This is _exactly_ what I’ve been looking for.

    Though I haven’t worked on the fixture protection piece yet, I’ve ported the rakefile to 1.1.2 format. It took a little more work than I expected, since Rake doesn’t allow one to (easily) delete or redefine tasks, only enhance them. The updated rakefile is at
    http://www.bluevoodoomagic.com/migration_schema_type.rake and it should be backwards-compatible with earlier versions of Rails.

  5. Nice work, it’s hard to believe rails has no support for this itself! However, I’m having a slight problem. When I enable the migration with test database, “rake test” works fine. However, when I execute “all tests” in radrails 0.7, the IDE completely blocks (I have to close it forcefully). I know this is probably a problem with radrails, but maybe you have some idea?

  6. Is this thread still alive ? I am running version 1.1.4 of rails, and the shield_nonfixture_data is not working for me. I think it needs to be updated. Can anyone help ?

  7. This plug-in is, for all intents and purposes, an orphan. I am not longer associated with the project in which it originated and I don’t have any need of it in my current projects. Please feel free to take it and modify it if you are so inclined

    This plugin, as provided here, will only work against rails 1.0. That means it will require some, probably significant, effort to port it to a more modern Rails. Given all that you might consider Mr. Voorhis’ suggestion of using fixtures for this purpose. It would probably be easier.

Comments are closed.