atom-tools’ *NIX tools

atom-tools’ bin directory has several UNIXy tools that I’ve never really mentioned before.

These tools operate on “collections”. I’m using the term in a broader sense than RFC 5023; a “collection” can be an AtomPub Collection, a directory containing Atom Entries, or a feed on stdin or stdout.

atom-cp

atom-cp copies the contents of a source collection to a destination collection.

atom-grep

atom-grep prints a feed to stdout containing all the entries in the source feed that match a given regexp.

atom-purge

atom-purge DELETEs every entry in the given collection.

atom-post

atom-post POSTs the contents of a file or stdin to a given URL.

It doesn’t do anything fancy, it’s just a convenient way of getting media or an Entry created by some other tool onto the web. Eventually this will construct Entries too, but not yet.

Use Cases

Back up your blog:

atom-cp http://example.org/coll ./backup/

Restore a backup, or post several pre-created entries:

atom-cp ./backup/ http://example.org/coll

Delete spam:

atom-grep poker http://example.org/comments | atom-purge -

Plagiarise somebody else’s blog:

atom-grep "popular content" http://example.org/coll | atom-cp - http://example.com/seo

Post a picture to your media collection:

atom-post -m image/png http://example.org/media icon.png

(Disclaimer: XML is a terrible format to pass down a pipe. It’s awfully convenient though, and my pipes and my collections haven’t been long enough for it to be a problem.)

Too many Ruby Atom libraries

Ruby now has 3 Atom Publishing Protocol libraries: atom-tools, atomutil and ratom. They all use the Atom namespace, and they’re all incompatible. Not a great situation.

ratom uses libxml-ruby rather than REXML. It’s embarassingly faster than atom-tools. ratom can parse the 1100 Atom Entries in my Venus cache in 0.25s; atom-tools takes 6s.

I haven’t been able to get atomutil to work. I expect it is faster than atom-tools, too (though not nearly so dramatically); atom-tools parses an XML tree into a tree of Ruby objects (instead of just wrapping the XML tree). This may have been a mistake.

atom-tools 2.0

Published atom-tools 2.0 a few days ago.

The XML parsing and building has been completely reworked. This should make handling extensions much easier.

A big feature for people writing clients is HTTP caching support (ported from Joe Gregorio’s httplib2). I actually commited this to the darcs repository the same day I published 1.0 but never released a 1.1 version (oops!).

It’s got some new UNIX-y tools that I’ll write about later.

I’ve gotten rid of the YAML mapping entirely; it wasn’t as human-read/writeable as I had hoped. I think there’s promise in doing something with Maruku for that.

I’m slowly moving from Test::Unit to rspec (thanks to a lot of grunt work by Simon Rozet). I think the result is a lot cleaner.

I’ve tried to keep things as backwards-compatible as possible, but some things have changed since the 1.0 release. The main difference is that Atom::Collection now represents an app:collection element, instead of just being a fancy Atom::Feed.

PushPin 2.0

I’ve rewritten PushPin, my Atom Publishing Protocol client. I’ve moved from Camping to Rails, giving the application some much-needed structure.

Major new features:

  • stored passwords are encrypted with AES
  • media collections
  • service document autodiscovery
  • AuthSub (I’m not sure if RFC 5023 support has been pushed onto mainline Blogger yet, though)

The UI should be much more polished, too (although there’s still lots of room for improvement).

Since I’ve got this this blog’s comments coming in via AtomPub, they’ll be broken until I implement OAuth.

New Ruby Atom Library

A new Ruby Atom format/Publishing Protocol library, atomutil has just been released. I haven’t had a chance to look at it deeply yet (thanks to final exams), but it’s well documented, and it looks like it has good support for extensions elements (much better than atom-tools has at the moment).

Laboratory: Commenting via the Atom Protocol

I have (yet again) reimplemented comments on this blog. The twist: comments are submitted via the Atom Publishing Protocol.

Each of this blog’s individual entries has a link[@rel="replies"] that points at an APP collection that anyone can POST to. Right now, each of those collections is a subset of http://necronomicorp.com/lab/comments, which is an APP collection running on a slightly modified version of the software behind this blog.

POSTed entries get thr:in-reply-to/@ref filled in automatically, so that clients don’t need to worry about trying to figure out the proper atom:id to use.

Since I don’t expect the world to adopt this overnight, I’m using PushPin to provide a traditional HTML forms interface.

There are a few things I like about this. I can use any APP client to edit the collection, which means I don’t need to write up an interface for it. More interestingly, you can (at least in theory) read and reply to an entry without ever seeing a web page. It’s Usenet 2.0!

Some questions arise:

  • “replies” seems like the right way to point at a GETtable collection, but how do I denote that the collection is POSTable?
  • how do trackbacks fit into this?

“Atom Authentication” sucks.

While the Atom Publishing Protocol leaves implementations free to use whatever method of authentication they please, historical circumstances have associated the Protocol with an obscure authentication method that has neither a name nor a specification.

I refer of course to the method described in this article. I hesitate to call the method Atom Authentication (because that suggests a closer association with the Protocol than exists), but I can’t come up with anything better.

The idea is sound - it functions as designed and has a few advantages over Digest authentication (mostly for people stuck with crappy hosts). The algorithm is well described in WS-Security’s UsernameToken profile (which “Atom Authentication” is based on). The general form of the challenge and response are described in the Mark Pilgrim article.

But common practice and the article have diverged. While the article (and Sam Ruby) produces an ASCII nonce and sticks it straight in the X-WSSE header, all the client code I’ve seen base64s the nonce before it goes in the header, and servers seem to require it. I can’t find this specified anywhere.

This sucks. Don’t use “Atom Authentication” unless you absolutely must, and if you absolutely must please write a proper spec first.

hReview in PushPin

Instead of studying, I added rudimentary hReview support to PushPin. Try it out.

It’s starting to get kind of big; may be time to graduate from Camping to Rails.

Stupid Atom Tricks: XMPP Bot

Continuing my habit of writing x-to-Atom gateways, here’s a little Jabber bot that will post messages you send it to an Atom Publishing Protocol Collection.

It requires atom-tools and the xmpp4r library.

require "atom/collection"
require "xmpp4r"

$coll_user = "bct"
$coll_pass = "atom-password"
$coll_url = "http://necronomicorp.com/testatom?app"

$bot_jid = "test@atompub.necronomicorp.com"
$bot_pass = "xmpp-password"

http = Atom::HTTP.new
http.user = $coll_user
http.pass = $coll_pass

coll = Atom::Collection.new $coll_url, http

cl = Jabber::Client.new($bot_jid)
cl.connect
cl.auth($bot_pass)

cl.add_message_callback do |msg|
  e = Atom::Entry.new  
  e.content = msg.body

  res = coll.post! e

  if res.code == "201"
    cl.send(Jabber::Message.new(msg.from, "success!"))
  else
    cl.send(Jabber::Message.new(msg.from, "#{res.code} #{res.message}"))
  end
end

Thread.stop

Try it out; send a message to test@atompub.necronomicorp.com. It should appear in my test collection.

(The real version is a server component which handles several collections on different JIDs.)

Introducing PushPin

PushPin is a web-based Atom Publishing Protocol client. The goal is to provide a simple way for users to work with the Protocol, but the really nice thing about it is that it allows you (the server developer) to stop worrying about all those users who haven’t got a client.

Right now it creates, edits and deletes entries. If you’ve logged in with your OpenID, it will store your authentication details and list of collections. The basics are there but there’s still a lot to do.

I’ve put up a test collection that demonstrates how it can be used inside another web app, acting as sort of a web forms ↔ Atom gateway. More on that when things have stabilised.

Please, give it a spin. I haven’t set up any sort of bug tracking yet, so send suggestions and bugs directly to me.

Things that are coming in the vague future:

  • grabbing collections from introspection documents
  • app:draft and the Slug: header
  • a variety of purpose-specific entry editors (eg. WYSIWYG, hReview, …)

“Atom Publishing Protocol” and “Atom API”: not synonyms

Back in 2003 there was a lot of buzz surrounding the (then new) Atom syndication format and its sister, the Atom API. Mark Pilgrim published an article about the API on XML.com, it was implemented in some high-profile applications (including Blogger and TypePad) and it generally wormed its way into people’s brains.

Shortly afterward its name was changed to the Atom Publishing Protocol, erasing all that lovely brand recognition. Since then, the specifics of the Protocol have changed significantly. Most notably, the API is based on a draft version of the Atom format (now deprecated), while the Protocol is based on Atom 1.0. The two formats are just different enough to cause problems (barring extremely lenient software).

The similarity of the two names and the plethora of existing material about the API (Google turns up twice as many results for “atom api” as for “atom publishing protocol”) are already confusing people. The Protocol will be an RFC any day now*, and it won’t be good for anyone if implementors are looking at the wrong specs and users at the wrong clients.

I’ve seen this mistake made several times recently (1, 2); let’s try to end the confusion.

* I’ve been saying that for months.

atom-tools 0.9.0 on its way

The version number’s a bit of a jump, but only because I’d like to hit 1.0 at about the same time the Publishing Protocol makes it to RFC.

No real changelog I’m afraid, but here’s the big stuff:

One of these days I’m going to have to set up Trac.