Rails vs Node.js

That title got your attention, didn’t it? While trying to make this decision myself recently I really wished some agile manifesto style value statements existed for these two platforms. Now that I have my first production deploy of a Node.is app I’m going give it a stab.

The Ruby on Rails community prefers:
  • Speed of development over runtime performance
  • Clarity of intent over clarity of implementation
  • Ease of getting started over ease of personal library choice
  • Freedom to customize over ease of debugging
The Express/Node.js community prefers:
  • Runtime scalability over speed of development
  • Less code over covering every use case
  • Freedom of library choice over ease of getting started

I am not suggesting that these communities don’t care about the things on the right, just that they care more about the things on the left. When faced with a tradeoff between the two values they will most often optimize for the value on the left. Of course not everyone in these communities will agree with these values and this is a great thing. A loyal opposition is invaluable because it keeps community from going off the deep end. It is also possible that i am wrong and there is not even general consensus about some of these. There is a particular risk of that with the Express/Node.js principles as I am quite new to that community.

Some of these values spring, i think, from the respective languages used by the platforms. For example, clarity of intent and speed of development are strong values of the Ruby language and that mentality has made its way into Rails also. On the Javascript side freedom and simple base constructs are strong values of the language. Is it that the language we are writing in influences how we think or that people who prefer certain values choose a language that reflects their values? One supposes that once the linguist settle that whole linguistic relativity thing we might have an answer to this question.

For what it is worth we chose to use Node.js. We had a problem domain almost perfectly suited for Node.js (IO bound and limited business logic on the server side) and we wanted to try out something new. I think the latter argument was actually the more powerful for our team.

Rails tip #72: hands off other’s private parts

In Ruby on Rails the most common way to pass data from the controller to the views is by allowing views direct access to the controller’s instance variables. Encapsulation is one of the cornerstones of software engineering. Why is it thrown out the window for views? Allowing external code access to an objects private parts is just wrong! Seriously, god kills a kitten every time someone does this. What is worse this anti-pattern is perpetrated in pretty much every tutorial on Rails Guides and every other RoR tutorial i have ever seen.

Forgoing encapsulation for controllers has the same issues as it has for any other type of object. You couple the implementations of the two components in a way that has very high connascence. This means that if either the view or the controller change it is very likely to require a change to the other. All of this adds up to more fragility and maintenance and less fun.

Consider this basic posts controller implemented in the current nominal RoR style


class PostsController < ApplicationController
  def index
    @posts = Post.all
  end
  
  def new
    @post = Post.new
  end

  def show
    @post = Post.find params[:id]
  end
    
  def update
    @post = Post.find params[:id]
    @post.update_attributes params[:post]
  end
  
  def destroy
    @post = Post.find params[:id]
    @post.destroy
  end
end

That repetition around finding the post sets my teeth on edge. You could fix the repetition by pulling it out into a before filter but that complexifies the code and makes the already deep stack even deeper. The indirect invocation nature of filters makes it easy to overlook their existence and it requires you remember which actions they get invoked on and which ones they don’t. You even have to keep that stuff in mind when writing partials that are many levels of inclusion removed from the controller. Having to keep all that state in your brain slows you down. It also increases the risk of misremembering something and writing a view that doesn’t work.

Fortunately, Rails has a good solution to this problem. Consider the following refactor


class PostsController < ApplicationController
  def update
    post.update_attributes params[:post]
  end
  
  def destroy
    post.destroy
  end
  
  def post
    @post ||= if params[:id]
                Post.find params[:id]
              else
                Post.new
              end
  end
  helper_method :post
  
  def posts
    @posts ||= Post.all
  end
  helper_method :posts
end

And an accompanying view partial


<h1><%= post.title %></h1>
...

Now we have two efficiently memoized accessor methods for the data we want to expose to the views. The resulting code is better in several ways

  • The code is DRYer. If we want to implement visibility control for posts we can do it in exactly one place.
  • The logic of the action is easier to follow because the data lookup code is called explicitly.
  • The views and controller are less connascent. For example, if the show view wants to display a list of all posts in the side bar, only the view needs to change, not the controller.
  • The code is easier to keep efficient because those database lookups only happen if they are needed. There is little chance of looking up data that is not used by any one. The lookups only happen if the controller or views explicitly request the data.
  • The view code is more reusable. If you want to reuse that partial in another view (say by fully displaying the ten most recent posts in the index view) you can easily do so by rendering it with a post local variable.

Encapsulation is just as good a policy for controllers as it is for models.

Is ruby immature?

A friend of mine recently described why he feels ruby is immature. I, of course, disagree with him. There is much in ruby that could be improved, but the issues he raised are a) intentional design choices or b) weaknesses in specific applications built in ruby. Neither of those scenarios can be fairly described as immaturity in the language, or the community using the language.

Set

Mr. Jones’ main example is one regarding the Set class in ruby. In practice Set is a rarely used class in ruby. I suspect it exists primarily for historical and completeness reasons. It is rather rare to see idiomatic ruby that utilizes Set.1

This is possible because Array provides a rather complete implementation of basic set operations. Rubyist are very accustom to using arrays. So is more common to just use the set operator on arrays rather than converting an array into a sets.

The set operations on Array do not have the same performance characteristics mr. Jones found with Set. For example,

$ time ruby -rpp -e 'pp (1..10_000_000).to_a & (1..10).to_a'
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

real	0m10.152s
user	0m6.592s
sys	0m3.515s

$ time ruby -rpp -e 'pp (1..10).to_a & (1..10_000_000).to_a'
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

real	0m12.410s
user	0m8.397s
sys	0m3.860s

Order still matters, but very much less. (That is on 1.8.6, the only version i have handy at the moment. I am sure that 1.9, or even 1.8.7, would be quite a bit faster.)

Libraries that are low traffic areas don’t get the effort that high use libraries do in any language. Even though Set is part of the standard library, it is definitely counts as a low traffic area. Hence, it has never been optimized for large numbers of items. This is appropriate because as we learned from Ron Pike “n is usually small”. The benefits of handling large sets performantly is not worth the addition complexity for a low traffic library.

nil

In his other example mr. Jones implies that the fact that nil is a real object is disadvantageous. On this count he is simply incorrect. Having nil be an object allows significant reductions in the number of special cases that must exist. This reduction in special cases often results in less code, but is always results in less cognitive load.

Consider the #try in ruby. While not my favorite implementation of this concept, it is still a powerful idiom for removing clutter from the code.

#try executes the specified method on the receive, unless the receiver is nil. When the receive is nil it does nothing. This allows code to use a best effort approach to performing non-critical operations. For example2,

def remove_email(email)                                                                                         
  emails.find_by_email(email).try(:destroy)                                                                     
end  

This is implemented as follows:

module Kernel
  def try(method, *args, &block)
    send(method, *args, &block)
  end
end

class NilClass
  def try(*args)
    # do nothing
  end
end

You could implement something like #try in a system that has non-object “no value” mechanism. It would be less elegant and less clear, though. (It would probably be less performant too because method calls tend to be optimized rather aggressively.) Have nil be an object like everything else is one less the primitive concept that the code and the programmer must keep in mind.

Mr. Jones does bring up the issue of nil.id returning 4 and that value being used as a foreign key in the database. This is not a problem i see very often, but i can happen.

This is definitely not a problem with ruby. Rather results from an unfortunate choice of naming convention in rails. Rails uses id as the name of the primary key column for database tables. This results in an #id method being created, which overrides the #id provided by ruby itself for all objects. If rails had chosen to call the primary key column something that did not conflict with an existing ruby core method – say pk – we would not be having this discussion.

In general

Mr. Jones asserts that “ruby is rife with happy path coding”. I disagree with his characterization. The ruby community has a strong bias towards producing working, if incomplete code, and iterating on that code to improve it. This “simplest thing that could work” approach does result in the occasional misstep and suboptimal implementations. In return you get to use a lot of new stuff more quickly and when there are problems they are easier to fix because the code is simpler.

The ruby community has strongly embraced the small pieces, loosely joined approach. This is only accelerating the innovation in ruby. Gems have lowered the fiction of distributing and installing components to previously unimaginable levels. This has allowed many libraries that would have been to small to be worth releasing in the past to come into existence.

Rack, with it’s middleware concept, is an example of the ruby community taking much of the Unix philosophy and turning it to 11. While rails has much historic baggage, even it is moving to a much more modular architecture with the up coming 3.0 release.

Following these principles does result in some rough edges occasionally, but the benefits are worth the trade. The 80% solution is how Unix succeed. An 80% solution today is better than a 100% solution 3 months from now. (As long as you can improve it when needed.) We always have releases to get to, after all.


  1. I, on the other hand, do use set rather more than the average rubyist. Set is a rather performant way producing collections without duplicate entries.

  2. Shamelessly copied from Chris Wanstrath.

Nucleic Teams

A nucleic team is one with small core group of permanent employees, usually just 1 to 3 people, that is supplemented as needed by contractors. The core in a nucleic team is too small to do the anticipated work, even during slow periods of development. The core teams job is two fold, first it implements stories that are particularly complicated, risky or architecturally important. The second role of the core team is to manage a group of contractors by creating statements of work, doing code reviews, etc.

The nucleic structure should provide a lot of advantages from a business stand point. You get many of the benefits of having an in-house development team. Advantages like developers that have the time and incentives to become domain experts. A consistent group of people with which all the stakeholders can build a rapport. A group of people that work together long enough to build the shared vision it takes to create systems with conceptual integrity.1

Those advantages are combined with the advantages of pure contracting team, at least in principle. The primary advantages of a pure contracting are that you can scale the development organization, both up and down, rapidly and cost effectively. Many organizations with in-house development teams end up having to maintain a sub-optimally sized development team. Work loads and cash flow tend to vary a bit over time. It takes a long time to find and hire skilled developers. Once you do, it really sucks to have to lay people off, either because of the lack of work or lack of money. Resizing development teams is so costly and disruptive that most organizations tend to pick a team size that is larger than optimal for the slow/lean times but less than optimal for the plentiful times.

Risks

This structure is not without it risks, though. Finding talent contractors is not easy. Contractors, by their very nature, cannot be relied on when planning beyond their current contract. Most importantly, though, contracting usually has an incentive structure that favors short term productivity. All of these can threaten the long term success of project if not managed correctly.

To counteract the risks inherent in contract workers the core team must be committed to the business, highly talented and fully empowered by the executive team to aggressively manage the contractors. The core team members must be highly skilled software developers, of course, but this role requires expertise in areas that are significantly different from traditional software development. The ability to read and understand other peoples code rapidly it of huge importance. As is the ability to communicate with both the business and the contractors what functionality is needed. The core team also needs to be able to communicate much more subtle, squishy, things like the architectural vision and development standards.

The core team will not be as productive at cutting code as they might be use to. The core team role is not primarily one of coding. A significant risk is that the members of the core team might find that they do not like the facilitation and maintainership role nearly as much as cutting code. It is necessary to set the expectations of candidates for the core team appropriately. One other risk is that the core team will get so bogged down in facilitation and maintainership tasks that they actually stop cutting code. The “non-coding architect” is a recipe for disaster, and should be avoided at all costs.

While this team structure has much going for it, it will be challenging to make work in practice.

Origins

I think this team structure is developing in the Rails community out of necessity, rather than preference. Rails is a highly productive environment. That can make it a competitive advantage for organizations that use it. However, the talent pool for Ruby and Rails is rather small. Additionally, many of the people who are highly skilled at Rails prefer to work as contractors. The percentage of the Rails talent pool that prefers to be independent seems quite high by comparison to any other community i know of.

This raises a problem for organizations that would like to create an in-house development team using Rails. Most of the talent would rather not work for you, or anyone for that matter. However, if you can build a small core team to manage the development and hold the institutional knowledge for the project you can utilize the huge talent that exists in the Rails contractor community to drive the project to completion.

I am not sure if this structure and the reason behind it are good, or bad, for the Rails community as a whole. The nucleic team model might turn out to be a competitive advantage in itself because it embodies the benefits of both internal and external development teams. On the other hand, it is bound to be a bit off putting for organizations that are not use to it.


  1. See Mythical Man Month by Fred Brooks for more details on the importance of conceptual integrity.

Versioning REST Web Services (Tricks and Tips)

In my previous post on this subject I described an approach to versioning the API of a REST/HTTP web service. This approach has significant advantages over the approach that is currently most common (i.e. embedding a version token in the URL). However, it does have some downsides. This post is an attempt to outline those and to present some ways to mitigate the negative impacts.

Nonstandard MIME media types

Using content negotiation to manage versions requires, by definition, the introduction of nonstandard media types. There is really no way around this. I personally don’t feel this is a Bad Thing. The new, nonstandard, media types do a much better job describing the sort of media the client is requesting. It does, however, mean that browsers – and perhaps some HTTP tools – will work less well with the web service.

The browser not working is a pretty big issue. They are almost certainly not the target consumer of the services, but having the browser not work raises the level of effort for exploring the API. If you have created a cool new service you want as few barriers to entry as possible. Personally, I always use curl when I am exploring but I know several people who would prefer to use a browser.

Unfortunately, I don’t really have a great general solution for browsers. That being said, in many situations a much can be done to make life better. For example, if the resources in question do not have HTML representations you could serve the current preferred format with a generic content type that browsers can render – e.g. text/plain or application/xml – to browsers.

Curl

One advantage of having the version token directly in the URL is that it makes it really easy to use curl against the service. By default curl makes requests with the Accept header field set to */*. For a reasonably designed service this would result in a response in the current preferred format. If you want to change to Accept header you need to invoke curl like this

curl --header 'Accept: application/vnd.foo.myformat-v1+xml' http://api.example/hello-world

That is not too horrible, really. It is a bit much to type all the time, but I have curl rc files for all the formats I deal with on a daily basis. If your service is implemented in Rails there is an even easier way. With Rails you give each format you support a short name that may be used as an “extension” for URLs. For example, if we define the short name for application/vnd.foo.myformat-v1+xml to be mf1 we can say this

curl http://api.example/hello-world.mf1

That is equivalent, from the point of view of a Rails based service, to the previous example. I imagine similar functionality could be implemented in most web frameworks. This effectively puts you back to having the version embedded in the URL, which is convenient for debugging and exploration. (It is still unsuitable for production use, though, for all the same reasons as other approaches to embedding the version in the URL.)

Nonobviousness

Another potential downside of using content negotiated versioning is that the various versions my be less discoverable, compared to a version-in-the-URL approach. I am not entirely sure this is true – after all there is a version token in the media type – but if it is true it would be a Good Thing.

Do you really want people “discovering” a version of the API that was deprecated a year ago? I think it might be better, in either approach, to use version tokens that are not readily guessable. Obviously, previous versions of and API will be documented and remain accessible, but raising some barriers to entry on depreciated parts of a system seems appropriate to me.

Unfamiliarity

This may be the biggest issue of all. People are just not very familiar, and therefore comfortable, with content negotiation. This in spite of the fact that it has been a fundamental part of HTTP since forever. I think this features obscurity is waning now, though, because it is such a powerful feature.

Two years ago Rails got content negotiation support. (That link seems to be broken at the moment. You can see part of the post I am talking about by going here and searching for “The Accept header”.) As frameworks like Rails keep adding and improving their support for this powerful feature the community of developers will become more familiar and comfortable with it. What is needed now is more education in the community on how best to utilize this feature.


If you’re interested in REST/HTTP service versioning be sure not to miss the rest of the series.

ActiveRecord race conditions

Ara Howard has discovered that the ActiveRecord validation mechanism does not ensure data integrity.1 Validations feel a bit like database constraints but it turns out they are really only useful for producing human friendly error messages.

This is because the assertions they define are tested by reading from the database before the changes are written to the database. As you will no doubt recall, phantom reads are not prevented by any isolation mode other than serializable. So unless you are running your database in serializable isolation mode (and you aren’t because nobody does) that means that the use of ActiveRecord validations setup a classic race condition.

On my work project we found this out the hard way. The appearance of multiple records should have been blocked by the validations was a bit surprising. In our case, the impacted models happened to be immutable so we only had to solve this from for the #find_or_create case. We ended up reimplementing #find_or_create so that it does the following:

  1. do a find 2.

    1. if we found a matching record return the model object
    2. if it does not exist create a savepoint
  2. insert the record 4.

    1. if the insert succeeds return the new model object
    2. if the insert failed roll back to the savepoint
  3. re-run the find and return the result

This approach does requires the use of database constraints but, having your data integrity constraints separated from the data model definition has always felt a bit awkward. So I think this more of a feature than a bug.

It would be really nice if this behavior were included by default in ActiveRecord. A similar approach could be used to remove the race conditions in regular creates and saves by simply detecting the insert or update failures and re-executing the validations. This would not even require that the validations/constraints be duplicated. The validations could, in most cases, be generated mechanically from the database constraints. For example, DrySQL already does this.

Such an approach would provide the pretty error messages Rails users expect, neatly combined with the data integrity guarantees that users of modern databases expect.


  1. You simply must love any sample code that has a method Fubar.hork_the_db.

Things to be Suspicious Of — attr_accessor_with_default with a collection

My team ran into this problem yesterday where the a particular, very important, request was failing in one of our Rails apps. The failure did did not make much sense and even more confusingly, the same code worked perfectly in the console. As part of debugging the problem we restarted the mongrel cluster, and suddenly everything worked again.

I hate it when the symptoms go away before you have a chance to diagnose the root cause of a problem. It still out there waiting to bite you again, and you have no idea what actually causes the problem. Well after quite a while looking at the code I noticed a bit of code similar to this.

class Widget < ActiveRecord::Base
  attr_accessor_with_default :merge_queue, []

  def merge(thing)
    merge_queue << thing
  end 

  def do_pending_merges
    # shift each thing off merge_queue and merge it into self 
  end 
end

Looks innocent enough. However the attr_accessor_with_default is the problem. You can normally think of attr_accessor_with_default as short hand for something like

def merge_queue
  []
end

However this is not strictly speaking true. As written above the default value for all instances of Widget#merge_queue are the exact same Array object. So rather merge_queue behaving as a private instance variable it acts like more like a shared class variable. This means that anytime you add something to merge_queue one instance of a Widget you are adding it to the default value of merge_queue for all current and future instances of Widget.

This turned out to be our problem. Restarting the mongrel cluster made the symptoms go away because merge_queue’s default value was no longer erroneously pre-populated, and the code worked in the console for the same reason. We had never noticed the issue because when #do_pending_merges worked correctly it emptied merge_queue as it went. However, when more than one merges happened simultaneously or, as in our case, the merging failed, the shared merge_queue default value contained some erroneous items.

This attribute accessor like should have been written like

attr_accessor_with_default(:merge_queue){[]}

In this form, the block is evaluated each time a default value is needed meaning that each instance of Widget would have gotten its very own brand new empty array.

So the moral is, be very, very suspicious if see an attr_accessor_with_default with a default value that is a collection. It is possible that it may be correct, but it is not very likely. More likely is that the original author did not realize that the exact same instance would be used as the default value each time the attribute accessors were called.

Decouple the File in your Rails Plugin, Please

Defining new behavior for core Rails classes in mixins is a common pattern in Rails plugins. This allows for a separation of concerns that improves maintainability and digestability. However, it raises a bit of question about where the mixin inclusion step should take place. Should it happen in the plugin’s init.rb or in the same file as the mixin module is defined?

I recently had cause to use a couple of Rails plugins outside of a rails project. This experience has allowed me to resolve this question, and the answer is: include the new behavior in the core classes in the same file that you define the behavior. That way if you need to use a subset of the behaviors defined in a plugin you can just require that file. The people who use your plugin in ways you don’t yet anticipate will thank you.

Oh yeah, and please don’t rely on the automatic module creation that dependency base autoloading in ActiveSupport provides. It is a lot simpler for your users if you explicitly define all the modules you need.

Hierarchical Resources in Rails

Consider a situation where you have a type of resource which always belongs to a resource of another type. How do you model the URI space using Rails? For example, say you have an address resource type. An address is always associated with exactly one user, but a user may have several addresses (work, home, etc).

The simple approach

The simplest approach from a Rails implementation perspective is to just have a flat URI space. In this scenario the URI for the collection of addresses associated with a user and a particular address would be, respectively:

http://example.com/addresses?user_id={user_id}
http://example.com/addresses/{address_id}

From a REST/Web arch standpoint there is absolutely no problem with this URI. It is a bit ugly for the humans around, though. Worse yet, one might reasonably infer from it that http://example.com/addresses references the collection of all the addresses known to the system. While that might be nice from an information modeling point of view, in reality that collection is probably going to be too large to return as a single documents. To be fair, it would be perfectly legal to respond to /addresses with a 404 or 403, but it would be a bit surprising to get that result if you were exploring the system.

The fully hierarchically approach

Edge Rails contains some improvements to the resource oriented route generators. One of the changes adds support for sub-resources. Sub-resources are support via the :has_many and :has_one options to ActiveController::Routing::Map#resources. These options produce fully hierarchically URIs for the resources. For example

http://example.com/users/{user_id}/addresses
http://example.com/users/{user_id}/addresses/{address_id}

The first URI references the collection of all the addresses for the specified user. The second URI references a particular address that belongs to the specified user. These URIs are very pretty, but they add some complexity to the controllers that fulfill them.

The additionally complexity stems from the fact that address_id is unique among all addresses of (in most cases it would be an automatically generated surrogate key). This leads to the potential for the address_id to be valid but that the address it identifies to not belong to the user identified by user_id. In such cases the most responsible thing to do is to return a 404, but doing so takes a couple of extra lines in each of the actions that deal with individual addresses.

The semi-hierarchical approach

After trying both of the previous approaches and finding them not entirely satisfactory. I have started using a hybrid approach. The collection resources are defined below the resources to which they belong but the collection member resources are referenced without an intermediate. For example

http://example.com/user/{user_id}/addresses
http://example.com/addresses/{address_id}

This has the advantage of producing fairly attractive URIs across the board. It also provides an obvious location to add a collection resource containing all the child resources if you have a need for looking at all of them with out the parent resource being involved. And it does not require any extraneous code in the controllers to deal will the possibly of the specified parent and child resources being unrelated.

On the downside, it does requires some changes to the routing system to make defining such routes simple and maintainable. Also, it might be a bit surprising if you are exploring the system. For example, if you request http://example.com/addresses/ you will get a 404, which is probably not what you would expect.

Even with the disadvantages mentioned above I am quite pleased with how the URIs and controllers turn using this technique. If you are looking for a way to deal with hierarchical resources you should give it a try.

Rake is Sweet

Rake is a really excellent build tool. It is basically Make on steroids (and minus a few of the annoying inconveniences of make). If you build software of any sort you owe it to yourself to check out Rake.

The source of my Rake related euphoria today is that I just used a feature of Rake that is not available in any other build tool that I know. Namely, I added an action to an existing task1. This feature allows you to extend the behavior of a task that you do directly own. Say for example, ones defined by the framework you are using.

My particular situation was this. I have some data that is absolutely required for the application to function (permissions data in this case). Changes to this data don’t happen at run-time and the code explicitly references these records. Which means that while this information is stored in the database it is more akin to code and the data model than it is to the data managed by the application.

Given that this data is reference explicitly by the code it must reside in source control. Rails migrations are and excellent way to manage changes to the data model of an application and, as it turns out, the foundation data too. If you need to add or change some of this foundation data you can just write a migration to add, update or delete the appropriate records.

There is one slight issue with using migrations to manage foundation data, though. Only the structure of the development database gets automatically copied to the test database. So the code that requires the foundation data will fail its tests because that data does not exist. I have run into this problem before. That time I solved it by changing the way Rails creates the test database such that it used the migrations rather than copying the development database’s structure. It is a very nice approach but unfortunately it does not work for my current project.

To solve my problem this time I simply added an action to the db:test:prepare task to copy the data from the table. The standard db:test:prepare task provided by Rails dumps the development database’s structure and then creates a clean test database using that dump. For our project it still does but then it follows that up by dumping the data from roles table and loading that into to the test database also.

Extending the db:test:prepare task means that all the tasks get an appropriate test database when they need it. And without me having to go around and added a dependency to all of them. I love it when my tools let me solve my problems easily.


  1. A Rake task is equivalent to a target in Make and Ant