Resque-multi-step

I’ve been developing using asynchronous jobs quite a bit lately.1 There is only one reason to do work asynchronous. It takes too long to do it synchronously.

Fortunately, it turns out that many of these very large work loads are embarrassingly parallel problems. And look, you have several (dozen) workers just waiting to do your bidding. It just makes sense to break up these large blocks of work into many smaller chunks so that it can be processed in parallel.

Breaking a task up into many small parts comes with some issues. Any task that takes long enough to run asynchronously is probably going to take long enough that you need to track its progress.

Also the problem is probably not completely parallelizable. Most problems seem to have a large portion of easily parallelized work followed by a bit of work that can only happen after all the parallel work has been complete.

These patterns show up often enough that i have gotten tired of repeating myself. Hence was born resque-multi-step. Resque-multi-step is a Resque plugin that provides compound job support complete with progress tracking, error handling, and a completely serial finalization sequence.

Example

Say you want to reindex all the posts in a blog. However, committing solr for each post would be excessively slow. (Trust me, it really is.)

Resque::Plugins::MultiStepTask.create("reindex-#{blog.name}") do |task|
  blog.posts.each do |post|
    task.add_job ReindexWithoutCommit, post
  end
  
  task.add_finalization_job CommitSolr
end

This reindexs all the posts in parallel. Any available workers will pick up a job to reindex a specific blog post. Once all those reindex jobs have completed, the finalization job will be executed.

If you have more that one finalization job, they are executed serially in the order they were added to the task.

Administrivia

If these issues sound familiar give resque-multi-step a try. It is available as a gem so installing is just

gem install resque-multi-step

If you want to contribute head on over to the github project and hack away. If you come up with something useful i’ll integrate it post haste.


  1. resque-fairly was one of the first public out comes of such work. The fair scheduling it provides the basis for this effort.

resque-fairly

I have been using Resque quite a bit recently. It is a really nice asynchronous job system based on Redis.

Resque checks the queues for jobs to process in a fixed order. (In alphabetic order, to be precise.) This turns out to be a problem is you want predictable handling time for jobs. For example, consider a system which has queues aaa and zzz. If you add 100 jobs to aaa and 1 job to zzz, the job on zzz will wait a long time before being processed.

This problem is easily solved by just checking the queues in random order. Over time, any particular queue will be checked early so a few deep queues will not starve the other queues in the system.

resque-fairly is a Resque plugin which provides that behavior. Just install the gem, add require 'resque-fairly' and Resque will handle queues with approximate fairness.