20 Nov 2007
•
Miscellaneous
My dad is a carpenter. He has always taken a great deal of pride in his craft. By the time, I was old enough to work with him he was doing mostly finish work. I occasionally got to work with him when I was younger. But, there were always lots of things I was not allowed to do. Mostly, as it turns out, because I would not have been able to do them very well. Now that I am older, I really appreciate the fact that my dad took enough pride in a job well done not to let me mess it up. And I am sure his customers appreciated it even more.
Based on my experiences with my dad, I had sort of assumed that attention to fit and finish was a part of doing “finish” work. Not so much, it turns out.
We are having our kitchen remodeled at the moment and our new granite counter tops are not quite level and there is some spots that really should have epoxy that don’t. The new floor is beautiful, if you don’t look too close. However, if you do look close you will see that there is a lot of dirt, hair and who knows what else in the finish. Not to mention that mostly the planks are only marginally smoother now that the floor is “done” than they were when they were first brought into our house.
We are, thankfully, nearing the end of our kitchen remodel. I will be glad when it is over, regardless of the outcome. I am, however, really disappointed by the workmanship that has gone into it. I find this extremely frustrating, and not just because I feel riped off. Don’t these people want to do a good job and explore and refine their craft?
The answer is, obviously, no. I suppose that is because for most of them this is “just a job”. I have had quite a few different jobs in my life but none of them has been “just a job”. In fact, I have a hard time even imagining what that would be like.
It make me sad to think of these people spend so much of their time doing something that is not even worth doing well. But mostly it makes me angry that they are doing it on my dime. Perhaps, companies could just offer a “it won’t suck” upgrade to their normal bids and use real craftsmen for those jobs. I for one, would be willing to pay a bit extra up front for a job well done.
11 Nov 2007
•
Software Development
Ara Howard has discovered that the ActiveRecord validation mechanism does not ensure data integrity.1 Validations feel a bit like database constraints but it turns out they are really only useful for producing human friendly error messages.
This is because the assertions they define are tested by reading from the database before the changes are written to the database. As you will no doubt recall, phantom reads are not prevented by any isolation mode other than serializable. So unless you are running your database in serializable isolation mode (and you aren’t because nobody does) that means that the use of ActiveRecord validations setup a classic race condition.
On my work project we found this out the hard way. The appearance of multiple records should have been blocked by the validations was a bit surprising. In our case, the impacted models happened to be immutable so we only had to solve this from for the #find_or_create
case. We ended up reimplementing #find_or_create
so that it does the following:
-
do a find 2.
- if we found a matching record return the model object
- if it does not exist create a savepoint
-
insert the record 4.
- if the insert succeeds return the new model object
- if the insert failed roll back to the savepoint
-
re-run the find and return the result
This approach does requires the use of database constraints but, having your data integrity constraints separated from the data model definition has always felt a bit awkward. So I think this more of a feature than a bug.
It would be really nice if this behavior were included by default in ActiveRecord. A similar approach could be used to remove the race conditions in regular creates and saves by simply detecting the insert or update failures and re-executing the validations. This would not even require that the validations/constraints be duplicated. The validations could, in most cases, be generated mechanically from the database constraints. For example, DrySQL already does this.
Such an approach would provide the pretty error messages Rails users expect, neatly combined with the data integrity guarantees that users of modern databases expect.
10 Nov 2007
•
Personal
This weekend Elliot, Audrey and I and went for a hike on Fowler Trail1. As always Elliot was playing with sticks along the way. At some point he showed me a stick and said, “look Daddy, it’s a gun”. I replied, “Oh, OK”. I am not a bit fan of gun play, but I realize that it is inevitable so I just try not to make a big deal out of it. Elliot was, predictably, completely unfazed by my lack of enthusiasm and continued playing with his “gun”.
At about the one mile mark the trail leaves Eldorado Canyon State Park and there is a sign with several icons indicating what is and isn’t allowed on the trail past that point. Elliot asked me what the sign said, so I told him, “it says no dogs, no camping, no fires and no guns”. Elliot immediately says, “oh” and drops the stick gun right there in the middle of the trail.
23 Oct 2007
•
Software Development
Authentication has been bane of my existence lately. By which I mean, it is complicated and interesting and I am loving every minute of it (but, as you can see, I am not going to let that stop me from complaining about it). However tonight I have run into an authentication problem that I am not sure how to solve. I am hoping someone out there can point me toward a solution.
So, Dear Lazyweb, here is my question: is there a mechanism available that allows an HTTP client to have a single identity for several applications and to be able to authenticate itself to each of those applications in such a way that even a malicious application would be unable to impersonate the actor to the other applications in the system?
Oh yeah, and it would be really nice if this were already implemented for libcurl and in Ruby.)
Some background
I have a system composed of a set applications which communicate with one another using RESTful web services. This system supports the addition of arbitrary new applications to system. However,some of these applications maybe written by (relatively) untrusted parties.
All actors, both end user and components of the system, have a single system wide identity. This identity is managed by the one trusted component in the system. This component is responsible for, amongst other things, authentication of actors.
We settled on OpenID as the mechanism for end user authentication. Other than having one of the worst specs I have ever read OpenID is really nice. OpenID solves this problem by forwarding the user’s browser to the identity provider (our trusted component) and the identity provider verifies the user’s identity claim. The application that requested the authentications is then notified of success of failure of the authentication process. This approach has the advantage that the user’s password, even in an encrypted form, never passes though the untrusted components of the system.
Unfortunately, end user authentication is only a subset of the authentication required for this system. There are many automated actors that also make have use of the resources exposed by components in the system. These actors need to the authenticated also, but OpenID is rather unsatisfactory for this purpose. So another solution to delegated authentication is required.
My initial thought was to use MD5-sess based HTTP Digest auth. The spec explicitly mention that it could be used to implement authentication using a third party identity provider. Upon further study, however it only works if the application doing the authentication is trusted. This is because to verify the requester’s identity the application must have the hash of the users account, realm and password. With that bit of information it would quite easy for the application to impersonate the original requester. In my environment of limit trust that is unacceptable.
Another potential, if naive, option is to use HTTP digest auth but to pass the authentication credentials though to the identity provider. The identity provider could then response with an indication of whether the requester proved that they new the password. Unfortunately, the additional load placed on the identity provider by having to verify the requester’s identity for every single request handled by any part o the system is just too great. Not to mention the additional lag this would impose on response times.
Now, the astute reader will by now be fairly yelling something about how this problem was solve by Kerberos years ago. Not only is this true but theoretically, the negotiate HTTP auth scheme supports Kerberos based authenitication. However, I have yet to find any Ruby libraries that support that scheme. Tomorrow, I will probably dive into the RFC to determine if I can implement support myself. If you know of a library that implements this scheme please let me know.
I have also looked at OpenID HTTP authentication. It looks a bit simpler than the Kerberos based negotiate auth scheme, but it seems a bit under cooked for a production system. On the other hand, it does have potential. If there are no other options it might be workable. It would be pretty easy to implement on the Ruby side of the house, particularly since I have spent the last couple of days coming to terms with OpenID, but on the C++ side it might be a bit more of a problem.
Anyway, it is late now so I am going to go to sleep and await your answers.
12 Oct 2007
Patrick Mueller contemplates whether or not we really need URIs in our documents1. This is a pretty common question in my experience. This question comes up because it is not always immediately obvious just how powerful embedding links in documents is.
What Mr. Mueller suggests is that if you have a client that needs account information for a particular person that is could simply take the account numbers found in the person representations and based on some out of band information, construct the account URIs. For example, if you got a person representation that looked like
<person>
<accounts>
<account><id>3242</id></account>
<account><id>5523</id></account>
</accounts>
</person>
The client could then make get requests to http://bank.example/accounts/3242
and http://bank.example/accounts/5523
to get the persons account information. The client would have constructed those URIs based on some configuration or compile time information about the structure of account URIs. This is a very common approach. Hell, it is even the one use by the ActiveResource library in Rails. But common does make it good.
Magically creating URIs out of the ether would work at first but say this bank we work for buys another bank. There are some people that have accounts at both banks. Now, if a persons accounts where referenced by URI, rather than just number, you could just add them to the list like this:
<person>
<accounts>
<account href="http://bank.example/accounts/3242"/>
<account href="http://bank.example/accounts/5523"/>
<account href="http://other-bank.example/accounts/9823"/>
</accounts>
</person>
The fact that some accounts are served by the original system and some are served by the other banks system is unimportant. However, if the client is constructing URI based on out of band information this approach fails completely. This is just one example of the sort of problems that disappear when you reference resources by URI, rather than some disembodied id.
One of the potential advantages of using just ids, rather than a URI is that it will require less work on the server to generate the document. I suppose ids are less costly, in a strict sense, if the server generating the document is also serves the account resources. But how much faster? Building a URI like the ones above could be as cheap as a single string concatenation. As far I am concerned, that is not really enough work to spend any time worrying about. On the other hand, if the server generating the document does not also serve the account resources, then the accounts should be being referenced by URI internally anyway so using the URI should be cheaper (not to mention safer).
Mr Mueller suggests, as a proof point, that Google Maps must work by URI construction based on a priori knowledge of the shape of tile URIs. It may well, for all I know, but it certainly would not have to. For example, the server could pass the client a tile URI template and the client could then calculate the x and y offsets of the required tiles based on the x and y values of the tiles it already has. Or each tile could include links to the tiles that touch it (which would allow arbitrary partitioning of the tiles which would be nice). No doubt there are other reasonable RESTful choices too.
The more I work with REST based architectures the more enamored of hypermedia. Links make your representations brightly lit, well connected spaces and that will benefit you application in ways you probably have not even imagined yet.