Big Data is the Library of Babel

September 13, 2013
522 Views

ImageRichard Ordowich, commenting on my Hail to the Chiefs post, remarked

ImageRichard Ordowich, commenting on my Hail to the Chiefs post, remarked how “most organizations need to improve their data literacy. Many problems stem from inadequate data definitions, multiple interpretations and understanding about the meanings of data.  Skills in semanticstaxonomy and ontology as well as information management are required.  These are skills that typically reside in librarians but not CDOs.  Perhaps hiring librarians would be better than hiring a CDO.”

I responded that maybe not even librarians can save us by citing The Library of Babel, a short story by Argentine author and librarian Jorge Luis Borges, which is about, as James Gleick explained in his book The Information: A History, A Theory, A Flood, “the mythical library that contains all books, in all languages, books of apology and prophecy, the gospel and the commentary upon that gospel and the commentary upon the commentary upon the gospel, the minutely detailed history of the future, the interpolations of all books in all other books, the faithful catalogue of the library and the innumerable false catalogues.  This library (which others call the universe) enshrines all the information.  Yet no knowledge can be discovered there, precisely because all knowledge is there, shelved side by side with all falsehood.  In the mirrored galleries, on the countless shelves, can be found everything and nothing.  There can be no more perfect case of information glut.”

More than a century before the rise of cloud computing and the mobile devices connected to it, the imagination of Charles Babbage foresaw another library of Babel, one where “the air itself is one vast library, on whose pages are forever written all that man has ever said or woman whispered.”  In a world where word of mouth has become word of data, sometimes causing panic about who may be listening, Babbage’s vision of a permanent record of every human utterance seems eerily prescient.

Of the cloud, Gleick wrote about how “all that information—all that information capacity—looms over us, not quite visible, not quite tangible, but awfully real; amorphous, spectral; hovering nearby, yet not situated in any one place.  Heaven must once have felt this way to the faithful.  People talk about shifting their lives to the cloud—their informational lives, at least.  You may store photographs in the cloud; Google is putting all the world’s books into the cloud; e-mail passes to and from the cloud and never really leaves the cloud.  All traditional ideas of privacy, based on doors and locks, physical remoteness and invisibility, are upended in the cloud.”

“The information produced and consumed by humankind used to vanish,” Gleick concluded, “that was the norm, the default.  The sights, the sounds, the songs, the spoken word just melted away.  Marks on stone, parchment, and paper were the special case.  It did not occur to Sophocles’ audiences that it would sad for his plays to be lost; they enjoyed the show.  Now expectations have inverted.  Everything may be recorded and preserved, at least potentially: every musical performance; every crime in a shop, elevator, or city street; every volcano or tsunami on the remotest shore; every card played or piece moved in an online game; every rugby scrum and cricket match.  Having a camera at hand is normal, not exceptional; something like 500 billion images were captured in 2010.  YouTube was streaming more than a billion videos a day.  Most of this is haphazard and unorganized.”

The Library of Babel is no longer fiction.  Big Data is the Library of Babel.