Human Shardable Apps: Designing for Perpetuity

Yea, not so much. (CC by jcarbaugh)
Yea, not so much. (CC by jcarbaugh)

There’s a bright orange Gowalla shirt in my closet.  There’s a sticker for Gowalla on the door of the painfully named Suburbia Torchy’s Tacos for it, exhorting you to check in.  There might even still be a Gowalla app installed on your iPhone.  But Gowalla is no more.  When it was shutting down after the team was aqui-hired by Facebook, they claimed to be working on a way to let users download their data: photos, status messages and check-ins.  That never happened.

Those of us in the web startup community don’t spend much time thinking about the legacy our applications will leave.  We rush to new technologies and platforms without a thought to what will happen when the investors pull their cash or the company pivots to selling speakers out of the back of a truck.  Just like we’ve embraced things like scalability, test suites, and code maintainability, it’s time to start taking our software legacy seriously.  It’s time to start thinking about our responsibility to our users, not as table IDs or profiles, but as human beings.

I’m as guilty of ignoring this issue as anyone.  From 2006 to 2010 I led a team at Polycot that built and hosted the Specialized Riders Club, a social network for riders of Specialized Bicycle Components gear.  We were a contract development shop, so aside from our monthly budget for hosting, we only got paid for doing big new development projects, like adding photo and video sharing or internationalization.  When we designed new features we never discussed what would become of things if the site was shut down, and we didn’t budget money for shutdown contingencies or user data exports.

When the time came to shut the site down and migrate the Riders Club to a new platform, a notification was sent out to users.  They were given a few months to archive any content from the site the old fashioned way, copy and pasting or right clicking and saving.  Then it was gone.  Admittedly the number of active users we had at the Riders Club is dwarfed by the number of users Gowalla had, but the same responsibility applies.  If we’d gotten export requests we would have pulled the data and sent it on, but we need to start thinking about the data our users entrust us with from the start.  By asking them to share their content with us, we have a responsibility to them.

Bruce Sterling talked about this in his 2010 closing talk, and Jason Scott from Archive Team and The Internet Archive had a great talk about it at dConstruct 2012, I suggest you take a listen.  Archive Team tries to collect sites that are destined for the trash heap, archiving things like Fortune City, Geocities and MobileMe.  They have a VM you can run that’ll run their automated scraper tool.  It’s a pretty cool hack, but the fact that Archive Team even has to exist is a testament to how bad we are at considering our legacy.

Historically few sites offered useful data exports, and if they did they were in a format that you’d need to write your own application to utilize.  37 Signals Basecamp had XML exports, but didn’t have an HTML option in 2009.  Facebook added a data export option in 2010, and it’s getting better, but I don’t believe it’s in an application friendly format.  Twitter is finally rolling one out for their users, but it’s been 3 months and I still can’t export mine.  Even if I have mine and you have yours, there’s no way for us to put the two together and get any networked value.  They’re designed for offline reading or data processing, not so the spirit and utility of the service can live on.

Especially as web applications get more dynamic and collaborative, I think we might need to start thinking in terms of giving users the option to have an program to interactively use, or even a program which can utilize multiple data exports to create a mini-version of the site.  If your application’s simple, then maybe a stripped down python or ruby application that you can access with a web browser.  If your application’s complex, then maybe an i386 based VM.  Spin it up, it has a complete site environment on it which can import the data exports from your live site.  Maybe even import as many data exports as you have access to.  You should already have something like this to get your developers up to speed quickly, it shouldn’t be too hard to repurpose it for users.  You may say, “But my code is proprietary, why would I want to share it,” but most sites don’t really do anything special in software.  Gowalla might have had a unique ranking algorithm, but you can pull that out of a public release.  If your code is so terrible that you wouldn’t want it up on Github, you have other problems, but don’t let that stop you.  Bad code is better than no code.  When you start a project, make an implicit pact with your users.  They’ll take care of you if you take care of them.

In Aaron Cope’s time pixels keynote at the New Zealand National Digital Forum he talks about downloading archives of his Flickr photos with a project of his called Parallel Flickr (here’s the related conference talk and blog post), and the idea that maybe if we could download our contacts photos, perhaps it would be possible to re-assemble a useful web of photos when (inevitably) Flickr goes away.  That’s great, but it shouldn’t be left to users to build this code.  As web application developers, we should encourage this.  When you build a client application, give it the ability to use an alternate API endpoint.  That way if your site shuts down and your domain goes away, people can connect it to another host.  Or they can run their application through a private API middleware which archives things they want to keep private away from your service.

Eventually your site’s going to go away, and no matter how much lead time you give people, some day your funding will run out and there won’t be a site to host an export button anymore.  If your site is social, like MySpace or Facebook, is the data has inherent privacy concerns.  You can’t post an archive of all of Facebook or MySpace for people to download.  There are private messages, photos, comments and all kinds of other secure stuff in there.  But knowing this is going to be an issue, maybe we could create a standard method for authenticating sites users and bundling user data.  We could setup archive.org or some other site with enough ongoing donations (kind of like how the Federal Deposit Insurance Commission works) to store all the data for ever, and provide a self-service way to authenticate yourself and get at it.  Maybe a volunteer team to allow children and loved ones to download a deceased relatives data, or to help people who’ve lost access to the email addresses they had.

My birth mom kept a paper diary her entire life, and after she passed away from breast cancer the diaries passed down to her kids.  The family got together at our house last month, and tidbits of information from her diaries were mentioned several times, by my brother’s girlfriend who never even had a chance to meet her.  Imagine if my mom had thought, “Oh, I’ll just use Gowalla to log what I do every day.”  Her future daughter-in-law (hint, hint, Don) would never had the chance to know her in her own words.

Developers of the world, that’s the mandate.  Build your applications with their post-shutdown legacy in mind.  We need to consider it at every step in our development process, just like we consider deployment, usability and scalability.  We need to start building mechanisms for users to maintain their data without us before the money runs out.  We need user-centric exports built into the system from the start.  We need a way for users to get access to that data even when the site hosting the export button disappears.  We need all this so we can build the future with a clear conscience, knowing we’re leaving a legacy we can be proud of.

P.S. If you’re building legacy tools into your codebase, or know of someone who’s doing a really good job of this, leave me a comment.  I’d love to put up a post of real-world examples and pointers.

Update 1: Aaron Cope has an excellent talk/blog post on this topic as it relates to flickr, explaining in eloquent detail the trials and concerns as someone who’s built a shardable version of a major social service.  You can (should) read it here.