Are we doing enough to archive digital culture? 🗄️

In this week’s free weekly Cybercultural newsletter, I look at the problem of archiving cultural content from magazines, blogs, musicians and video gamers – and why it matters.

Have you ever had any of these issues lately while trying to access content?

  • You wrote a blog in the early to late 2000s, but it’s long since gone (or changed irrevocably, in my case). Can you still experience that blog in all its noughties HTML glory, via the Wayback Machine? The answer, as you’ll see below: it depends…

  • You access all your favourite music by streaming these days, so you no longer buy and/or download music. But you’ve just discovered that a niche artist you admired from the 90s has removed half their albums from Spotify. What’s more, that music is no longer available to purchase – or download via P2P. What to do?

  • You’re a Fortnite player and you’re looking for strategy tips. You remember a Twitch stream from a few months ago, where the guy gave out some awesome tips about your current game scenario. But turns out, that guy didn’t opt into archiving his videos (it’s turned off by default). The Wayback Machine doesn’t have it either, so that content is forever lost.

The Internet Archive is fighting an uphill battle

I was inspired to write this post by a weekend tweet from Jason Scott, who does work for the Internet Archive.

Scott mentioned the imminent closures of two well-regarded online publications, Linux Journal and Pacific Standard. Here’s what he said about how those two sites will be archived on the Wayback Machine:

“Archive Team will grab pretty good copies of these sites, but they won’t be perfect, because every modern site is essentially resistant to being copied or used in any reasonable fashion. They’ll render in browsers RIGHT THIS SECOND but will almost certainly not, soon.”

The Internet Archive is an international treasure; I’d argue it’s one of the two most valuable cultural resources on the Web, alongside Wikipedia.

Scott has a point that the Archive team is fighting an uphill battle to preserve cultural artefacts, like Linux Journal and Pacific Standard. Yet despite the unavoidable flaws in its archiving, what the Wayback Machine saves is a damn sight better than what publishers do. I speak from experience, as an online publisher who didn’t make sufficient backups of my previous online media business.

I discovered the error of my ways a couple of years ago, when I created an archive on my personal website of all the posts I wrote for ReadWriteWeb – the tech blog I ran from 2003-2012. The site, now called ReadWrite, has had two owners since me and multiple re-designs. While I was doing my personal archiving project, documented in this July 2017 blog post, I discovered that linking to my posts on the current ReadWrite site was very unsatisfactory. So I ended up linking to Internet Archive copies of my articles. Here’s why I chose to do that:

“The main reason is that the current ReadWrite design has moved too far away from my original RWW brand. That’s created a feeling, at least for me, that the old content looks out of place on the current site. The latest ReadWrite designs also introduced a bunch of technical glitches to the older content.”

It made me wish I had done proper backups of RWW over the years – including documenting all the different designs. So to answer my own headline question, no I didn’t do enough. Regardless, I’m thankful to the Internet Archive for doing its best to preserve RWW’s small (but meaningful, at least to me) contribution to online media history.

Your favourite music may not always be there

Every now and then I’ll open Spotify and notice that one or more of my favourite albums has disappeared. For instance, Duran Duran’s second most recent album, All You Need Is Now, is not on Spotify. I do have it saved on iTunes on my laptop, but I don’t have it available right now on my iPhone (in order to save space, I only transfer a portion of my rather massive music collection onto my phone).

Now admittedly, this is a first-world problem: a relatively minor Duran Duran album is not currently accessible to me via my smartphone! Most of Duran Duran’s other albums are available on Spotify, so it’s not really a big deal. But it does illustrate that you can’t necessarily rely on streaming music platforms to always have all the music you want. If you want to guarantee that, you’ll have to buy all your favourite music and save digital copies onto iTunes or a similar music software app.

But what if you can no longer buy the music you want, and it isn’t on Spotify or another streaming app? There’s an 80s artist I like, Danielle Dax, who released a slate of albums in the 80s through to the early 90s. But only one, Blast The Human Flower (1990), is on Spotify. That album is only available on Spotify because it was her one major label studio release. If you go looking for Dax’s other albums to purchase (which I have), you won’t find them on any digital store – they’re not on iTunes, Google Play, or even Bandcamp. In fact, the artist doesn’t even sell her old music on her personal website.

Okay, unlike with online publications, with music you should at least be able to rustle up second-hand copies of old CDs or vinyl. I’m sure I could build a thorough Danielle Dax collection that way, if I spent enough time and money tracking down all her old albums.

But it makes me wonder: are we too lackadaisical as music fans in this streaming era? We think Spotify or Apple Music or Tidal will have any music we want, now and into the future. But just as with online publications, those cultural artefacts can disappear at any time – so you better have a copy somewhere.

There’s no time machine for online gaming

At first glance, the streaming of online gaming doesn’t have an archiving problem. After all, the leading platform – Amazon’s Twitch – gives users the option to archive their content (albeit, as noted above, this option has to be turned on).

Here’s the catch though: the archives are kept for a short time only, even for premium streamers.

“Now that you have enabled archiving, Twitch will automatically save your broadcasts. For non-turbo users your videos will be saved for 14 days before being deleted. Twitch Partners and Twitch Turbo users will have their broadcasts saved for 60 days before being deleted.”

The only type of content Twitch users can keep “indefinitely” is what’s termed “highlights.” Twitch defines these as past broadcasts that have been chopped into “shorter video segments.”

There are some exceptions to the limits. For example, famous streamer Ninja has 901 videos on his Twitch account as I write this. Although since he’s now moved on to Microsoft’s service in a lucrative deal, one wonders if those videos will stick around much longer.

It should be noted that Twitch does allow creators to download their streams before they expire, so creators can archive their own content. But of course that doesn’t help other users who may want to view that content.

You can’t blame Amazon for imposing these time limits. Even though it’s one of the biggest cloud storage providers in the world, archiving video streams at scale requires a huge amount of storage space. That said, YouTube seems to have no issue with indefinite storage of consumer video. In any case, in August 2014 Amazon sent this message to the newly acquired users of Twitch:

“We found that the vast majority of past broadcast views happen within the first two weeks after they’re created. On the days following, viewership reduces exponentially.

We also discovered that 80% of our storage capacity is filled with past broadcasts that are never watched. That’s multiple petabytes for video that no one has ever viewed.”

At the time of the Twitch sale to Amazon, The Internet Archive took a number of snapshots of Twitch content (the archivist is listed as Jason Scott, whose tweets I quoted above). There are 2,215 items in the Twitch archive currently, mostly from 2014 but some from 2015.

As for current Twitch content, the homepage has been saved “44,394 times between July 5, 2002 and August 13, 2019.” Unfortunately, when I clicked through to view some examples, I was presented with this message:

That says it all really. Streaming video content will most likely not age well into the future, unless the streamer him or herself archives it.

Conclusion

While I love the Internet Archive, when it comes down to it the best way to archive cultural content is for the creators themselves to do it.

I should’ve archived ReadWriteWeb, before I sold the site in late 2011. In fact, I should’ve saved it multiple times over the years, to preserve the different designs it had from 2003 onwards.

Ideally, Danielle Dax should sell digital copies of her music on a platform like Bandcamp (to be fair, it’s possible she doesn’t own the rights to it – which is sadly not uncommon in the music world).

And for you Twitch streamers out there, if you think what you do has cultural value, why not archive it – and maybe even offer it to the Internet Archive. I’m not sure if they’d take it, but it doesn’t hurt to make the offer.

Our digital culture is valuable, let’s do better to preserve it.


You’ve just read the free weekly post from Cybercultural, a newsletter covering the intersection of technology and the cultural industries.

To receive all three editions of Cybercultural per week, please consider upgrading to a paid subscription. It’s just $7 per month and $70 per year. The subscriber editions feature exclusive value-add sections, such as ‘What You Need To Know’ (my analysis of the day’s culture-tech news), ‘Data Points’ (stats), ‘Deals’ (M&A, funding), and more.

Subscribe now

Thanks for considering it, your support would mean a lot. 🙏