Twitstream

Prompted by a tweet by Simon Willison on Monday, I was intrigued to hear about the Twitter real-time streaming APIs. In spare moments this week, rather than surfing the web, I found myself looking at how to get a view on the API from within Python which was… not trivial. In fact, none of the standard libraries seemed to handle the API at all: every HTTP access library waited until the stream closed, which was potentially forever.

A little poking and prodding, and I knew Twisted was capable of doing it. However, that seemed too heavy-weight a solution for just hacking about. I discovered asyncore, and despite the fairly thin documentation and examples online, it seemed clear that it could help. By evening I had knocked together something that worked with Twitter’s basic authentication and the proxy at work, which pretty much meant that I had to create my own basic HTTP/1.0 client. Not rocket science — I guess I had done the same thing 14 years earlier in Perl4 — but it took some trial and error, and neither the asynchat library nor any other common libraries did anything to simplify putting together an HTTP request.

I posted things up on github just to get some feedback on whether what I was doing was hopelessly misguided and reinventing the wheel, and — other than hearing it confirmed that Twisted could do this easily — it turns out no one else was doing this in Python. Cool, because they’re nifty APIs. It seems Twitter had turned into quite the target this year, with both Facebook on one hand and Google on another looking to get in on the action. I think the simplicity, transparency, and speed of the API are brilliant responses to things like Google Wave. This is very easy to work with, will keep developers around, and is pregnant with possibilities.

Oh, the speed. The fact that there is no apparent latency — a message that I send through my desktop client will appear in the stream before the miniature posting window even closes — makes this tremendously satisfying to work with.

After returning to the project like a bad rash, aided by encouraging words, github followers, and a primal pleasure in seeing words from all around the world spontaneously crawl up my screen, I’ve got a good chunk of code. It’s not brilliantly engineered, but I really enjoyed the process of it, the way it grew organically while trying it with different applications, such as a twistori clone to track (and highlight) multiple keywords in the Twitter stream.

There’s also fixreplies.py, a read-only client that mines your Friends, Followers, Favorites, and conversations to find people to track all public tweets to and from. While I think Twitter did the right thing in limiting replies, this makes for an interesting adjunct to a main, traditional twitter client. You get the feel for a lot more conversations sliding by. As it’s only text in the terminal, it feels more ephemeral, with far less of a need to catch up on everything.

Mostly as an exercise to see if it could be done, I also turned that client into a double-clickable script in Mac OS X. Click, and it opens up a terminal window, asks you for some information on your account, what sorts of users interest you, and how to find up to two hundred users’ conversations to listen in on.

I do hope that Twitter makes this available for desktop clients and the general public, because it really expands the, er, Twitterverse, and opens up new possibilities in interacting with and getting data from Twitter.