Pipes Dreams

As you know, i am on a job hunt. Like everyone else i use a variety of sources to find potential opportunities. I also have a wide breadth of skills so i search on a lot of keywords. Going to 10 different websites and doing 2 to 20 searches each gets old really fast. Worse yet some of these sites don’t have particularly sophisticated search capabilities so you end up having to look at a bunch of information that is not relevant.

So, over the weekend i decided to play with Yahoo Pipes to aggregate and filter all the job posting feeds. Pipes is quite impressive. The UI is very slick. The drag and drop visual connect-the-boxes presentation really puts the user at ease. Creating a basic version of the pipe i wanted was surprising easy. There was definitely some learning to be done but i had a working prototype up in very short order.

Almost immediately i was disappointed in the lack of a textual representation of the data extraction and transformation rules that comprise a “pipe”. Being a programmer by trade i always feel happier when i can use my favorite text editor, not to mention source control, for tasks like these. (You mean you don’t edited text fields in you browser with Emacs? Why not?) However, my wife, an ETL expert, assured me that visual editing is the norm for tools such as Pipes.

Once i started expanding from the initial prototype the dream really started collapsing around me. My pipe had a fairly complex set of filtering rules that i wanted to apply to each of a set of feeds. My initial thought was to extract those rules so that i could reuse them in multiple pipes, each handling data from difference sources. At it turns out, Pipes does not allow that. You can use other pipes as the input to a pipe, which is nice, but you cannot embed one pipes in the middle of another pipe. The inability to create custom reusable transformations is a real missed opportunity.

With that option off the table i decide to create one ginormous pipe that aggregated all the feeds before running their contents through the filters. That works, in principle, though it raised some factoring challenges in my case. (Some feeds required specified processing that other did not.) Once i had constructed my very large pipe i tried to save it, and this is what i got

Problem Saving: problem parsing response

Which is a little weird because it actually does seem to save when this happens. If i close the editor and reopen the pipe it has the changes. Unfortunately, it cannot seem to run the very-large-pipe reliably. Or at all, most of the time. Most of the time that pipe fails altogether. Sometimes it returns only a subset of the data that it should, but most of the time i get a “Pipes engine request failed (malformed engine data) (2)” error message. Regardless of the outcome, it always takes an unacceptable amount of time to run.

So now i am back at square one. I am faced with creating an expansive set of pipes which are largely duplicates on one another, but without even cut-and-paste reusability. (Did i forget to mention that the pipe editor does not support cut-and-paste?) So if i change my mind about the filtering criteria i would have to go change each of them by hand. I don’t think it would be worth the effort.

So, i guess i will go run my 40 searches manually every day and take the first job that comes along just to make the pain stop. Which really sucks because Pipes has real promise. After creating a small pipe i thought it was pretty much the coolest thing since sliced bread.

9 comments on “Pipes Dreams

  1. -

    Hmm… Peter???

    How about you use your spare free time, your textual editing skills, and your ruby experience to construct a dsl for pipes. Then you can run it with hpricot/nokogiri/mechanize and the output will be an rss feed.

    Just like pipes, but not visual and buggy and restricted, but open, flexible, fast and reliable.

    You can use it to save on the time where you go enter information into 40 websites.

  2. -

    Call it rb-pipes, host it on github, and everyone will say thank you.

  3. - Post author

    What free time? At the moment I am a stay-at-home dad with 2 kids to watch and a job to find. :) I could do what you are describing. In fact, I have considered it. I think it might be a cool project to use to explore Erlang a little.

  4. -

    Hardly an expert, you’re biased.

  5. -

    […] March 1, 2009 Cloud Computing , IT , SaaS , Trends A few days ago, Stefan joined a rant of Peter Williams on Yahoo! Pipes lack of a text representation: While for many models (and programs, and anything in […]

  6. -

    You can insert pipes as modules within other pipes, when you put them into a loop operator. Check out Mat’s Technorati Authority pipe at http://mediaczar.com/blog/2009/01/technorati-authority-yahoo-pipe/.

    But I agree with you on the interface. It does frequently go wrong and it’s very frustrating.

    Instead of going back to fully manual, why not subscribe to the feeds using something like Google Reader which you can then filter? Or even create your aggregated feeds in Google Reader, send that to a Yahoo Pipe which has the filters you want in place, then subscribe to the output RSS from that?

    Good luck with the hunt.

  7. -

    “The inability to create custom reusable transformations is a real missed opportunity.”

    As Brendan says, you can use pipes in pipes (but only in the loop module.) Indeed, it’s considered best practice to do this — and lots of pipes developers create re-usable objects like this. Most of the ones I’ve built recently around APIs like Technorati’s or Twitter’s use a modular pipe like this.

    But I agree — the ability to edit a pipe in a text editor would be brilliant. There’s no real reason that I can see why there shouldn’t be a text representation of a pipe that’s accessible to the user, and I can see how powerful that would be.

    You might like to look at the FriendFeed Yahoo! Pipes list (http://friendfeed.com/list/pipes)

    Or see what I’ve been trying to do w/ pipes (http://mediaczar.com/blog/category/pipes)

Comments are closed.