A personal link mill

2011-12-30

I posted a link to an interesting visualization paper on Twitter today, prompting someone to ask me where I had found it. Sadly, I had to admit that I had no clue where I first saw it referenced, due to the way I consume links I find on the net. So, I thought I'd write a quick blog post to explain myself, and then pitch a product idea that could make my life (and maybe yours) much easier.

First, the problem statement: my aim is to efficiently discover links to interesting stuff on the net. Simple as that. A few years ago, my flow of links came mostly from social news sites (Hacker News and Reddit), and items shared by people I follow on social networks. Over time, I became more and more disenchanted with this way of doing things. The social news approach is to take a torrent of very low quality links (user submissions), and then crowd-source the filtration process through voting. But popularity is not a good measure of information quality, and the result is a bland, lowest-common-denominator view of the world that has no room for anything that doesn't make it to the front page. Don't get me wrong - Reddit and HN do a lot of other things well - but they just don't cut it as primary information sources. Mining links from social networks is a more promising approach, but still problematic. None of the social networks provide the tools needed to extract shared links from the update stream and consume them efficiently. There is also a structural issue - I don't necessarily want to mix my social ties and my information sources, and I definitely don't want to be limited to just one platform. These are separate functions that I feel require separate tools.

Eventually, I took matters into my own hands. First, I hugely broadened the number of information sources I consumed. The tool I use for this is Google Reader - I now subscribe to about 800 individual feeds, and this number is growing daily. The trick here is to find high-quality, low-volume link sources. The motherlode of good links for me was to be found on social bookmarking sites. About 700 of my subscriptions are to the RSS feeds of individual users on Pinboard and Delicious. This gives me very fine control and a great mix of interests. Plus, getting links from individual curators handily sidesteps the social news group-think problem. The remainder of my subscriptions are split between blogs, some sub-Reddits, a few Twitter users and subsections of arXiv.

So much for how my intake works. Just as important is the way that I consume it. I do my "filtering" in batches, usually in the evening. Using Reeder on my iPad works well for me, letting me flick #quickly and comfortably through all the new links of the day. When I find something that looks interesting, I resist the temptation to read it then and there - instead, I batch up all my reading for later. If it's a web page, it goes to Instapaper. If it's a PDF, it gets downloaded into a DropBox folder, which is synced to GoodReader.

Finally, the actual reading. Every morning, I toddle off to a nice cafe with my iPad, and read all the interesting stuff I saved the previous day in a single sitting. I'm ruthless about just skimming things that don't warrant careful attention. If I find something particularly interesting I save it permanently, and perhaps tweet it or mail it to someone I think might be interested.

Problems - and a product idea?

This system works for me, but it has many problems. There's no end-to-end coordination, so by the time I sit down to actually read something, I have no easy way to tell which feed it came from. Google Reader sucks at managing hundreds of low-volume subscriptions. Reeder is a great, but is not tailored to consuming redundant information from many sources. The end result is that maintaining the system I have is a time-consuming pain in the ass. The fact that it's still worth it despite this, makes me think there might be commercial room for a better solution.

Which brings me to a rough product idea - a formalized version of this link mill for people who want to take direct control of their information intake. The business end is a generalized feed consumer, letting you subscribe to RSS feeds, Twitter users, Google+ updates, sub-Reddits and other information sources. Links are extracted from these feeds, keeping track of which links appeared where. The user is then presented with a stream of links to consume, de-duplicated so that those appearing in multiple feeds are presented only once. The system keeps track of links the user marks as "interesting", batching them for later consumption. It also uses this information to score the feeds, letting the user see which feeds are low quality, and should be ditched. Given the right tools, the time needed for a user to maintain and tend their link feed garden would be quite modest, and the rewards would be great.

If someone built this, I for one would gladly fork over some of my hard-earned doubloons to use it. In fact, with some validation of the idea and a few collaborators I might think of building it myself. Does this sound useful to anyone else?