Resque-multi-step

I’ve been developing using asynchronous jobs quite a bit lately.1 There is only one reason to do work asynchronous. It takes too long to do it synchronously.

Fortunately, it turns out that many of these very large work loads are embarrassingly parallel problems. And look, you have several (dozen) workers just waiting to do your bidding. It just makes sense to break up these large blocks of work into many smaller chunks so that it can be processed in parallel.

Breaking a task up into many small parts comes with some issues. Any task that takes long enough to run asynchronously is probably going to take long enough that you need to track its progress.

Also the problem is probably not completely parallelizable. Most problems seem to have a large portion of easily parallelized work followed by a bit of work that can only happen after all the parallel work has been complete.

These patterns show up often enough that i have gotten tired of repeating myself. Hence was born resque-multi-step. Resque-multi-step is a Resque plugin that provides compound job support complete with progress tracking, error handling, and a completely serial finalization sequence.

Example

Say you want to reindex all the posts in a blog. However, committing solr for each post would be excessively slow. (Trust me, it really is.)

Resque::Plugins::MultiStepTask.create("reindex-#{blog.name}") do |task|
  blog.posts.each do |post|
    task.add_job ReindexWithoutCommit, post
  end
  
  task.add_finalization_job CommitSolr
end

This reindexs all the posts in parallel. Any available workers will pick up a job to reindex a specific blog post. Once all those reindex jobs have completed, the finalization job will be executed.

If you have more that one finalization job, they are executed serially in the order they were added to the task.

Administrivia

If these issues sound familiar give resque-multi-step a try. It is available as a gem so installing is just

gem install resque-multi-step

If you want to contribute head on over to the github project and hack away. If you come up with something useful i’ll integrate it post haste.


  1. resque-fairly was one of the first public out comes of such work. The fair scheduling it provides the basis for this effort.

    </li> </ol> </div>

Task switching in Git

This thing happens to me pretty often: i start a story, work on it for a while then something urgent comes up.1 The urgent thing needs to be fixed right away but i have a lot of changes in my working directory. Unfortunately, the changes i have made are incomplete and non-functional.

The usually suggested way to handle this is with git-stash. For a long time, i used stash in this situation myself. However, i often found myself lost in the stash queue. If you use stash to store unfinished work your stash queue can become quite long. It is easy to forget you have stashed work. It is also easy to do a git stash clear and lose that work.

There are lots of situations in which it can be quite a while before you get back to your stashed changes. For example, if you switch tasks because the business deprioritized the feature. Or if the urgent issue gets interrupted by an emergency issue.

It recently occurred to me that git provides a much more elegant way to deal with unfinished work.

The steps

First, always work in a feature branch. You should be doing this anyway but it is required for this technique to work.

  1. git add -A (on the feature branch)
  2. git commit -m &#39;WIP&#39;
  3. Switch branches and fix that urgent issue. Using git like you always do.
  4. git checkout <feature-branch>
  5. git reset HEAD~1
  6. Continue where you left off. Once you are ready, commit.

This approach commits you in-progress work on the branch to which it belongs, keeping it safe.

How it works

Once you do your WIP commit your history will look something like:

That is great for temporarily storing your in-progress work. We definitely don’t want that nasty “WIP” commit in our history long term, though. The git reset HEAD~1 command changes the HEAD pointer of the feature branch back to the commit immediately before the “WIP” commit. That leave a commit graph something like:

Once you have completed your changes and committed the HEAD pointer of the feature branch will be updated to point the new commits. This leaves the “WIP” commit out of the commit history of the branch forever.

The “WIP” commit is now “unreachable” because no objects or references in the system point to it. It will be removed the next time you do a git gc.

git stash definitely has it place but i reserve it for situations where i am going to pop the stash very quickly (eg, i stash, the checkout a different branch, then pop).


  1. I do a lot of customer integration. Once a customer starts testing it is important to keep the turn around on their blocking issues to a minimum. If you don’t they get distracted and it’s no telling how long you’ll have to wait before they start testing again.

    </li> </ol> </div>

resque-fairly

I have been using Resque quite a bit recently. It is a really nice asynchronous job system based on Redis.

Resque checks the queues for jobs to process in a fixed order. (In alphabetic order, to be precise.) This turns out to be a problem is you want predictable handling time for jobs. For example, consider a system which has queues aaa and zzz. If you add 100 jobs to aaa and 1 job to zzz, the job on zzz will wait a long time before being processed.

This problem is easily solved by just checking the queues in random order. Over time, any particular queue will be checked early so a few deep queues will not starve the other queues in the system.

resque-fairly is a Resque plugin which provides that behavior. Just install the gem, add require &#39;resque-fairly&#39; and Resque will handle queues with approximate fairness.