Although the Internet has existed since around the beginning of the Cold War, it didn’t reach American households until the mid-1990s. The proliferation of big data has had a major impact on the future of the Internet. But has it been for the better?
How Has Big Data Shaped the Old Internet?
The old internet is vanishing at an alarming rate. Big data might not be saving it either. In fact, the drive for more information could be forcing people to abandon old data that seems less relevant. It’s not surprising that classic web hosting sites like Geocities have lost their appeal, but web pages from decades past are disappearing at a surprising rate. The period spanning the late 1990s and the early 2000s was a unique part of digital human history. Yet, protecting and preserving that unique stage of communication is spotty at best.
A lost era that might not be salvaged by big data
A lack of effort to preserve the old internet is already well known among communities with an affinity for history. Even niche services such as online music sharing groups lament the loss of old music archives that once catalogued rare or unusual examples of their genres back when web groups focused on their niche hobbies and helped sustain its long-term health through file sharing and word of mouth. It’s an issue that is unique to the internet, especially regarding less popular examples of media that haven’t been catalogued and re-issued by large companies over the years. Some archiving services use big data to preserve these data sets. Archive.org is the best known. However, they can only archive so much and rarely preserve rich media. It’s unlikely that we will ever lose access to major sitcoms or albums by popular artists, but shows with one short season on a smaller broadcast station or indie albums produced through minuscule labels always run the risk of being lost to time. The threat is doubly magnified for media that came before the era of usave broadband and high-capacity webhosting. They didn’t benefit from the big data revolution in time. Many of the web rings that circulated smaller works have long since folded. This issue stretches all the way to a thorough loss of the first five years of internet history that have completely vanished. It’s the first five years in which the majority of people across the world had easy access to an internet connection, but many of the relics of that era were lost when web hosting contracts expired and few preservation efforts were founded until the turn of the millennium.
Growing problems with modern big data solutions
At the heart of the problem rests two major issues. The internet isn’t static and the cost to keep it preserved is difficult to ignore. Efforts like The Wayback Machine are a good example of how often a website must be checked for updates and regularly backed up to ensure its various stages of life are preserved, but keeping up-to-date records of a site that changes too often is nearly impossible. Many early news sites and blogs are lost to time whereas newspapers from hundreds of years ago are still occasionally available in preserved and microfilm forms. Webpages that are made up of text are often small and easy to be preserved, but any website that deals in video or picture content is almost doomed to be lost to the passage of time. Even MySpace, once a bustling social media site with hundreds of thousands of songs, lost multiple years of uploads during a server migration that effectively erased everything uploaded to the site before 2015. The Internet Archive managed to post roughly 1.3 terabytes of data that was lost, but the sheer size of that archive helps put into perspective just how much space is taken up when archiving large sites. MySpace isn’t even the worst possible offender. Though photos and music files take up a large amount of hard drive data, a website like Vimeo or YouTube makes archiving a nearly impossible task. Websites that visualize YouTube’s traffic help bring a sense of perspective to what is an almost impossible task. Without more effort and funding, preserving large swaths of the internet will be almost entirely impossible. There’s more attention on preservation, but there’s also more stress put on those who have to update and maintain ever-sprawling databases as the internet continues to bloom in size at an almost infinite pace.
Big Data hasn’t saved the old Internet
Older web content is at the risk of being lost forever. Sadly, big data doesn’t appear to be saving it. We haven’t lost all of the internet, but there are enough gaps in our coverage that an entire mini-generation of early internet history has been lost. There are no easy answers, but raising awareness of the fleeting nature of internet media while pushing users to take on some of the monumental tasks that is preservation could save more of the internet before it disappears into the digital void. Will new big data solutions actually solve these issues?