RESTful resource creation (redux)

Benjamin Carlyle has posted a followup about using PUT to create new resources in which he brings up some interesting issues.

First, it seems I miss understood his original idea slightly. My misunderstanding does not affect how I feel about his approach much. I don’t like the idea of PUT-ting to a “factory” resource with a GUID in the query string any more than I like putting to a GUID based URI that the server might actually be able to use. In fact, I think I might like it less. On the other hand, PUT-ting to a “factory” is really the the same same thing I proposed in response to Mr Carlyle’s original post, I just left off the GUID bit. I find GUIDs to be slightly repulsive and I really don’t see any need for them in the approaches being discussed.

Response codes

Mr Carlyle also points out that the HTTP spec demands a 301 (Moved Permanently) redirect be used if the server wants a PUT applied to a different URI. Unfortunately that does not really match semantics of redirecting from a factory/URL generator resource. This occured to me when I was writing my proposal for safe resource creation (which is really just Mr Carlyle’s proposal without the GUID) but I punted and did not even mention the issue. A possible solution might be to use an extension code in the 300 series to mean “URI Reserved”. The would mean a PUT request to a factory/URL generation resource would response with

HTTP/1.1 372 URI Reserved
Location: http://example.com/yourNewResource

The semantics are nice and clean but it has the disadvantage of being non-standard. This sort of “extension code” is explicitly supported in the HTTP spec but it does require that clients be customized to understand it.

Leveraging POST

Stefan Tilkov proposes an alternate approach. His idea involves a POST request to URI generation service. This would then return the new URI in the Location: header. Totally workable. It requires a significant, though not disastrous, level of coupling between the server and client. The approach is loses some of it’s tidiness if the server would like to use the a natural key mechanism for the URIs (and it is positively messy if the server would like to transition from a generated key to a natural key). For the naturally keyed URIs to be generated the POST request to the URI generation service would have to include the the complete representation you want to store. This is fine in practice it makes it look even more like the PUT based approaches.

What does PUT mean

One of the reasons Mr Tilkov does not like the redirected PUT approach to resource creation is that

> it violates the original purpose of PUT, though — if I PUT to a > URI, I don’t expect it to have different results each time I do so

There is certainly one way to look at the redirected PUT request that is a little out of sync with the canonical PUT semantics but I don’t think it has any thing to do with the results or the request. The semantic problem I see is that a PUT is a request to store an entity at a particular URI. In the context of redirected PUT based resource creation is that the new entity will never be stored at the initial URI to which it is PUT.

This is not as big an issue as it seems at first glance, however. If you think of the initial URI used for resource creation as pointing to the next unused slot in a collection of resources, rather than it being a resource factory, the semantics line up much more cleanly. From this point of view, PUT-ting an entity to the “next empty slot” URI and being redirected to a permanent URI for that slot fits rather nicely with normal PUT semantics. The redirect is necessary because once a slot is spoken for the “next slot” URI, by definition, points someplace else.

This way of thinking about a the new resource is similar to a “latest” URI. No one would quibble about a resource with a URI like http://example.com/blog/latest. The response to a GET of that URI like this would change often and those changes would result from the change in state of some other resource (namely the posts collection resource). The important thing to keep in mind is that the resource in question is just most recent post. Similarly, a “next slot” resource always points to the next new member of a resource collection.

If you choose to use this way of thinking about resource creation perhaps a URI whose purpose is slightly harder to confuse would be helpful. Say something like http://example.com/purchase-order/next-new, though I am not sure if that is really better.

Better resource creation

It seems that I was a little unclear in my post about using the HTTP PUT method for resource creation, so let me try again.

In this post Benjamin Carlyle points out that using POST for resource creation has some serious flaws. To see the main problem with using POST for resource creation consider the following scenario.

You make a POST request to create a new purchase order resource but you don’t get a response. One of two things may have happened.

  1. the server got your request and create the purchase order

  2. the server never received the request, or crashed before it was able to create the new purchase order

If 1 happened you don’t want to re-issue the new purchase order request because that would double the order you just made. If 2 happened you do want to re-issue the new purchase order request otherwise you are not going to get the stuff you are trying to order. Unfortunately, with POST base resource creation there is no way to tell in which way the request failed, and therefore no way to decide what to do to resolve the problem without some outside (read: human) input.

It should be noted that when a human is already involved (such as in a user facing web app) this is really not much of a problem. If you try to create a new blog post but don’t get a response you just go look at your blog to see if actually worked or not and then do the appropriate thing. However, when you get into computer-to-computer interactions this issue is a big deal, and in many situation it borders on completely unacceptable.

An alternate approach

One way to handle resource creation that does not suffer from these issues is to use PUT instead of POST. PUT request are idempotent, by definition, so anytime you make PUT request and don’t get a response you can just re-issue the request until you do. Using that characteristic to implement resource creation would look something like this:

  1. A client needing to create a new purchase order makes a PUT request, containing a representation of the purchase order to create, to a well know new resource URI (something like http://example.com/purchase-orders/new)

  2. The server generates a new URI to reference this new purchase order, using whatever mechanism it chooses, and responses with redirect to this new URI.

  3. The client re-issues the PUT request containing the purchase order representation to the newly generated URI.

  4. The server stores the new purchase order and responses with a “201 Created”

In this scenario there are two points at which you could get no response from the server, but at each of those points there is a exactly one correct thing to do to resolve the failure. If you don’t get a response to the initial PUT to the “new resource” URI you can just re-issue it until you do get a response. Because the server never actually creates a resource at this point it does not matter if the server processed your request an the response got lost, or if your request never made it to the server. By re-issuing the request the worst thing that can happen is that a few URIs are generated that will never get used but URIs free so that is no problem at all. Once you have the redirect from the initial “new resource” request you can re-issue the PUT against the redirect URI. If no response is received you can simply re-issue the request until you do get one. The idempotence guarantee of PUT requests means that making the same request multiple times has the same effect as making it once.

This approach is adapted from a very similar approach suggested by Benjamin Carlyle. My primary concerns with the approach Mr. Carlyle suggested is that it forces the client of a RESTful web service to understand the URI generation scheme used by the server or the server to understand a URI generation scheme it has no intention of actually using. The GUID based URIs Mr. Carlyle suggests is workable but I think it forces far too much knowledge about the implementations of the client and server into each other. This knowledge would cause the server, clients or both to become more complicated than necessary.

Deprecating POST

Benjamin Carlyle has an interesting bit about the possibility of deprecating the HTTP POST method. I think most people who have thought deeply about RESTful architectures have had similar thoughts. GET, PUT and DELETE are all nicely idempotent, but POST is not. GET, PUT, and DELETE have clean, well defined semantics, but POST does not. POST generally seems a little out of place next to it’s cleaner cut cohorts. However, there is an absolute requirement for the “process this” semantics of POST so deprecating it is completely out of the question. On the other hand, it does get used in some situations where there are better approaches.

POSTs lack of idempotence has some nasty side effects particularly for what is probably the most common use of POST today, new resource creation. Consider the following scenario, you POST a request to create a new resource but you don’t get a response. It is impossible to automatically recover from this scenario. You cannot resend the request because the new resource may have been created and you just did not get the response and you cannot check to see if the resource was created because you don’t know the URI it would have been assigned if it had, in fact, been created.

I think Mr. Carlyle is correct that using POST for resource creation is sub-optimal. He suggest the following approach1, suppose you have a new purchase order resource you want to make known to the server. The client generates a GUID and the issues the following request

PUT /purchaseOrders/76fd9473-a270-4aac-8a06-e5265048cbbc HTTP/1.1
Host: example.com
Content-Length: ....

<PurchaseOrder>
...
</PurchaseOrder>

The server thinks “I do not know of a purchase order with an id of 76fd9473-a270-4aac-8a06-e5265048cbbc so this request is regarding a never before seen purchase order” then the go about storing that brand new purchase order and returns a “201 Created” response. From are RESTful point of view this is a reasonable approach. It relies on only standard PUT semantics, is resource focused and is idempotent so you can safely keep repeating the request until you get a response.

While relatively straight forward the approach does requires a lot of out of band information because the client has to understand the servers id generation scheme and be able to reliably generate a new id that is unique. If you are using GUIDs as the resource id’s you can be reasonably assured that clients can, in fact, generate globally unique ids so there are not really a practical short term problem.

Some potential problems arise when you have a server that uses a scheme for which ids cannot be reliably generated on the client, such as monotonically-increasing numbers, or where there would be a different way to figure the id for each type of resource, such as natural keys. It also tightly couples the clients to the server implementation in ways I think are dangerous. For example, what happens if you decide you would like to move to an URL scheme that is less ugly than GUIDs?

For all those situations Mr. Carlyle proposes continuing to use a GUID based URI for resource creation PUTs but rather than actually creating the resource, having the server response with a permanent redirect to a new URI at which the resource should exist. The client would then re-issue the PUT request to the new URI and only then would server create the new resource.

I really like the general arc of this proposal but I despise the GUID part. GUIDs are ugly and they offend my sense of elegance. However, this proposed approach can be easily tweaked into something I really like. Suppose that rather using GUID based URIs you instead used a “new resource” URI. For example you could say

PUT /purchaseOrders/new HTTP/1.1
Host: example.com
Content-Length: ....

<PurchaseOrder>
...
</PurchaseOrder>

the server would respond with

HTTP/1.1 303 See Other
Location: http://example.com/purchaseOrders/any_random_sort_of_id_the_server_wants
...

and the client would re-issue the PUT to the URI specified in the redirect. This pushes URI generation back to the server, where it belongs, while still retaining the general goodness of having all interactions between the client and server be idempotent.

It’s just too bad this approach does not work in a browser2.


  1. I rephrase it here to make sure I really understand it and because I found the examples in the original a little hard to follow.

  2. According to RFC 2616 (section 10.3) a redirection

    MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD.

    So while technically you could do this in a browser the user experience would suck.

Some thoughts about PHP

I have been using PHP1 as my primary language for five months now so I feel somewhat qualified to speak about it. My overall conclusion is that PHP is weak sauce. It is easy to get started with PHP but it’s usefulness decreases as the complexity of the application increases. This is primarily because keeping PHP code maintainable requires unnatural levels of discipline. Choosing PHP for a greenfield project is a technical risk that is unnecessary in today’s rich web application ecosystem. Of course, most projects aren’t greenfield so there are plenty reasons to have PHP around.

I find PHP to be a deeply frustrating environment in which to work. It seems to have occurred, as opposed to designed or even evolved, without much of an overarching vision. In “The Mythical Man-Month” Fredrick Brooks claims that “conceptual integrity is the most important consideration in system design”. Unfortunately, PHP has very weak conceptual integrity. It seems mostly to be a collection of decisions that seemed expedient at the time with little thought about how that would impact the over all system.

I remember hearing a few years ago about PHP is syntax to appear more Java like. That was back when it looked like there was going to be a real Java hegemony in business programming. At the time I thought it was a slightly odd but defensible idea. Today I see that decision as an indicator of weak conceptually integrity. I think any system willing to give up its character so completely must lack, almost by definition, the conceptual integrity needed to be great.

I covered several concrete issues with PHP in Early impressions of PHP. All of those issues still stand but my biggest problem with PHP, after working with it for a while, is that is seems designed to actively discourage meta-programming. This means I find myself writing annoying amounts of boiler plate code2. I strongly believe that the future of programming is language-oriented. This make PHP hard for me because even rudimentary language oriented techniques are simply not feasible in PHP.

A Somewhat more minor annoyance is the lack of closures and blocks. I first learned blocks and closures about two years ago and now find programming without them mildly painful. I think that Mark Jason Dominus got it right when he said

in another thirty years people will laugh at anyone who tries to invent a language without closures, just as they’ll laugh now at anyone who tries to invent a language without recursion.

There are just so many common classes of problem that are simply and cleanly solved by closures that not having them seems like a crime. I hope it does not take thirty years, though.

It would not be fair to leave this post without a discussion of the good things about PHP. PHP excels at lowering the barrier to entry. There is no other system I am aware of that even come close the ease of getting start with PHP. The weakness of PHP’s conceptual integrity does not seem to noticeably impact productivity in the context of small systems. The idea that you can have a web application by creating one text file and copying it to the web server is radically powerful.

And then there is Smarty. Smarty is a really nice external DSL for generating web pages, i.e. a template engine. The core of Smarty is well thought out and has very nice extension mechanisms. I is a joy to work with. If you are doing PHP work I can strongly recommend Smarty.


  1. As always, I need to point out that I am speaking primarily about PHP 4. I suspect that PHP 5 suffers from many of these issues but I have spent very little time in PHP 5.

  2. Usually this boiler plate code gets written after I have already spent an inordinate amount of time tyring and failing to automate it. The usable area in PHP seems remarkably small. That means it is going to take a bit more running into the edges before I completely accept that they are really that close.

Selenium

I finally got around to setting up Selenium and writing a few tests yesterday. I have been meaning to do that ever since Charlie pointed it out to me a few months ago. Man, I wish I hadn’t waited. Selenium is easily the best way to test a web app that I have every tried. With Selenium your application is tested in the same environment it will be used, in the browser. And it is easy to use, to boot.

Just to be clear, Selenium is not a replacement for unit tests1. It is a functional or acceptance testing tool. It is slower2 and its results are probably less specific than those of most unit tests. However, if you need to verify that your application actually works from the users perspective (and you do) Selenium is the tool for you. The tests are run in the target browser(s) so the tests fail when the user would see a failure, even if the problem is some esoteric browser compatibility issue.

If you are developing web applications and you are not using it already give Selenium a try. You will not regret it.

Postscript: Trouble in paradise

I am having one problem with Selenium, though. It is really, really slow in Internet Explorer 6 on my Ubuntu box. A test suite that takes 1 minute 15 seconds in Firefox take 12 minutes and 42 seconds in IE (same machine, network, etc). I expected IE to be slower, both because it is IE and because it is running under Wine but expected the difference to be percentages not orders of magnitude. If anyone has any ideas about how to make Selenium run faster in IE on Linux I would love to here them.


  1. I consider functional tests in Rails to be unit tests for the purpose of this discussion.

  2. Selenium actually make the same HTTP calls that will be made in the wild. This is a good way to test but it has significant overhead when compared to an approach, like “functional” tests in Rails, which by-passes the HTTP layer altogether.

Megalomania in Revision Control Systems

The more I work with Perforce the more annoyed I get with it’s belief that it is the center of the software development process. Not that Perforce is not unique in believing this. In fact, many commercial revision control systems have this same megalomania.

So, commercial revision control vendors, let me clue you in. Revision control systems are not the center of the software development universe, the file system on the developer’s machine is. Period. The primary job of a revision control system is to figure out what the developer did, after the developer has already done it, and to save that so that it can be applied to the other developers local file systems.

I am complaining about this because out of the box Perforce does not include a way to examine your working directories and get a list of what has changed since the last time you synchronized with the depot (Perforce’s word for a repository). This means that you must declare any changes you make to the source code tree before, or at the same time as, you make them. If you do not declare the changes up front it is almost certain that some required changes will be forgotten and will not get checked in.

The main problem with this approach is that it compels you to break your flow. You might not think that just having to execute one additional command would really be that bad. After all, executing a command could probably be incorporated into your personal process and thereby have little or on impact on your flow. In practice, however, it really does break the flow.

The true cost of declaring that a new file will exist with a particular name is not in the act of declaration. Rather, it is in the thought and commitment that such a declaration requires. Every day, I created new classes only to notice a couple of minutes later that it has morphed in to something quite different than what I originally intended. Declaring the name of a new file to the revision control system before I have even created it dramatically raises the cost of such emerging behaviors. To name a file well I need to understand it’s purpose and to do that I usually need to create it and put some code in it to see how it turns out.

Having to declare the name of the file first requires a significant amount of thought and, worse of all, it prematurely solidifies the design. This premature solidification of design is costly both in terms of the time it takes and the quality risks it introduces. Paul Graham makes a compelling case for a fluid approach to software design in Hackers and Painters. Forcing stability on an immature design merely locks in its deficiencies and that is exactly the result of requiring the declaration a file before it is created.

The need to declare modifications and deletions up-front is less onerous. Deletions are easier because, generally, it is easy to understand the consequences of deleting a file because there is only one. Modifications are easier because marking a file at “to be edited” is quite forgiving. Even if you failed to actually modify the file checking it in will have no functional impact. None the less, having to declare you intentions up-front is still annoying, and wasteful since they could be determined after the fact.

If you are a commercial revision control system vendor please take note. You are not, nor will you ever be, the center of the software development universe. Pretending that you are just costs your users time and effort that could be better spent solving real problems.

Mercurial and Perforce

I have been learning two different source control systems in the last couple of weeks. Source control systems are a vital tool in any software development activity so I want to share my impressions of these tools.

Mercurial

I finally got around to setting up a repository for my personal projects1. For these projects, I chose to use Mercurial. I have been hearing a bit of a buzz around Mercurial for a while and I decided it would do me good to try out one of these newfangled distributed source control systems2. So far, I am impressed. Mercurial is easy to setup and use and it seems quite capable.

So far my only real complaint is that Mercurial does not version directories. It only versions files and the directory structure are treated as meta-data about those files. This makes it impossible to have empty directories in the source tree. Not a huge deal but it is annoying at times.

Distributed source control is a bit of a different mindset than I am use to. The sort of source control systems I am familiar with are built around the idea that there is one-true-version of the code and that the one-true-version lives in the central repository. I think it may take a while to get completely comfortable with a source control system that believes that there are lots of versions of a project all of which are equally valid.

My unfamiliarity aside, I like the distributed approach because I think it accepts the reality of software development. There are usually multiple useful and meaningful versions of a particular project at any given time and a tool that embraces that fact ought to work better than one that does not. I need to use Mercurial quite a bit more to be sure it lives up to that promise, but so far it is looking good.

Perforce

At work, the SCM team recently settled on Perforce as our standard source control system and I get to be the first person in my group to use it for a real project. I have been using it for a couple of weeks now and I can say that it is cross platform3, reliable, fast. Those are all virtues, unfortunately Perforce falls way short on the ease of use.

For example, you might think that something like p4 rename old_name.rb
new_name.rb
would rename the file but in fact it just prints this

See ‘p4 help rename’ for instructions on renaming files.

And when you run that it prints a bit of text outlining a multi-step process, involving branching the file and then deleting the original, that will result in the file being renamed.

If you want to edit a file, don’t even think about just opening it with favorite editor and going to town. No, you need to explicitly inform Perforce that you are going to edit the file by running p4
edit myfile.txt
. As far as I can tell, Perforce does not have a way to list what you have changed since the last check-out (or sync in Perforce terminology).4 This means that you cannot just edit and change to your hearts and then notify perforce of the changes in a batch, no you have to interrupt your flow to inform the source control system that you are about to change a file.

The requirement to manually “open files for editing” and similar issues would probably be easier to deal with using the Perforce Emacs integration but I really despise tools that require IDE integration to be usable. Variety’s the spice of life. Sometimes Emacs will do it, sometimes – not often, but sometimes – I like the idea of using vi.

I have used much worse source control systems than Perforce but I would never choose it over the veritable cornucopia of better, and free, source control systems available.


  1. So far there is just one public project, PHP Markdown (with footnotes).

  2. Well, all that and it is written in Python.

  3. I have used it successfully on Windows, Linux, and FreeBSD with no problems.

  4. I find it really hard to believe that there is no Perforce equivalent to the status command in svn, cvs, hg, etc. I look for it in the documentation every couple of day, but so far any way to do this has eluded me.

Namespaces

Alex Bunardzic has written an interesting article about namespaces. He calls into question the usefulness of the canonical approach of hierarchically structured namespaces. I think he’s onto something but his example are pretty weak. It is easy to pick on the Java package names because they are often over the top. Unfortunately, Java package names are the namespaces that are most familiar. Most of the weirdness of Java package names come the fact that they tend to be used as way to categorized entities1. Using namespaces as a categorization mechanism is misguided and generally results in package names that are both bad namespaces and bad categories.

The primary use of namespaces is name disambiguation. This is a vital feature of programming languages. Any language without integrated namespace support has a gaping hole. Name disambiguation is provided by allowing a name in particular namespace and that same name in another namespace to co-exist simultaneously and independently .

Name disambiguation is necessary when multiple implementations of the same logical class exist. Name disambiguation is rarely required within the application layer. Developers generally have enough visibility into their own code base to avoid implementing duplicate classes. When naming conflicts occur it is usually between two independent libraries or between the application and the libraries it uses.

Namespaces should be used exclusively to prevent name conflicts. Class names should always include enough information so that a reasonably information person will know the purpose of the class. This means if you have an accounts payable object it is called “AccountPayable” regardless of what namespace in which it lives.

Name conflicts are a real issue. Code that will be used in an unknown environment should be in a non-default namespace. That means if you are writing a reusable library it should should have it’s own namespace. Namespaces should usually match independently installable components. Any namespace that you can only get by installing some other package probably should not exist.2

Just to be clear, namespaces are pure overhead. They are necessary in many situations but namespaces add no value to the application. One way to minimize the overhead is to put your application classes in the default (or null or root or whatever your language calls it) namespace. That means that the classes that you spend most of your time using do not require any namespace overhead. You will be responsible for ensuring that your code base does not include multiple classes with the same name. That is a constraint you should embrace because having multiple classes with the same name is just confusing.


  1. I use the word entity because in Java the things you name are not really objects.

  2. An interesting effect of this is that in a system with installation type packaging built in at the core you could treat namespaces as a aspect of the package management system. I wonder if you could retrofit that on top of RubyGems such that loading a gem resulted in all the classes being contained in a module with the same name as the gem?

PHP Markdown with Footnotes

I have been using Markdown to write my blog posts for quite some time now and I really like it. I was a little surprised how much I liked it when I first switched because before I had been writing in HTML which works well. But it really is easier to focus on the content when you don’t have to worry about all the minutiae involved with XML based formats.

The only real problem I had with Markdown (the syntax and PHP Markdown) was that it does not support footnotes. I really like footnotes so that was sad. Many months ago I noticed the footnote support in MultiMarkdown. It is really clean and the intent is obvious from the text. I liked it so much so I started using it in places that I never intended to convert to another format (readme files, comment blocks in code, etc). The whole time since I noticed the footnote syntax I have been waiting for someone to add that feature to PHP Markdown so that I can use it with WordPress.

Well, I finally got tired of waiting, so yesterday I implemented footnote support in PHP Markdown1. The result is a drop-in replacement for the original PHP Markdown2. You can download it here.

Feel free to use it to your hearts content but be aware that it is – how shall I put this – fairly minimally tested3.

Defining footnotes

A footnote is defined like this

[^my-footnote]: Explain the tangential thing that just occurred to me.

This will product the following at the end of the output (if it is referenced).

<ol class="footnotes">
<li id="footnote-my-footnote-(markdown document unique id)">
<p>Explain the tangential thing that just occurred to me.</p>
</li>
</ol>

The “markdown document unique id” is an md5 hash of the original Markdown text we are converting. This is needed to keep footnotes with the same name but in different articles separate from one another when they appear on the same HTML page, as they do in a blog.

The text content of the footnote is processed as a normal Markdown block so you can use all your favorite markdown syntax, both block and span elements, inside of it. A footnote is all the text from the [^footnote-name]: bit until the first line with a non-space character in it’s first column following a blank line. For example

[^my-footnote]: Something tangential just 
    occurred to me.
    
    This is a second paragraph in my footnote.

This is just another paragraph in the document, *not* part of the
footnote.

Code in is also supported in footnotes but it must be indented eight spaces, or two tabs, similar to having code in a list.

Referencing footnotes

Footnotes that are never references are stripped so to get the above output you need to reference the footnote somewhere else in the document. This is done like

I have some stuff to say[^or-not].

If there is an ‘or-not’ footnote defined, the following output will result

<p>I have some stuff to say<a href="#footnote-or-not-(markdown document unique id)" class="footnote-ref">1</a>.</p>

References to undefined footnotes are ignored and the [^name] text is left in the output.

The numbering of footnotes is based on the order of the references, not on the order of definition. So the first footnote you define may end up with a number other than 1 if it is not the first footnote referenced.

Happy Marking down.


  1. I started from version “Extra 1.0.1” and added a few functions to handle footnotes.

  2. I have only tested this standalone and as a WordPress plug-in. I did not change the codes related to other systems, nor the signatures of existing functions so it should work just like the original version. YMMV.

  3. I the footnotes behavior is unit tested relatively well – if you interested in the tests just let me know and I will send them to you – but the rest of the behavior of the code is untested so there might be some non-obvious interactions problems.