Why XML is incompatible with big data

Mike Driscoll lays out his misadventures using XML for managing large amounts of data: it’s too verbose and slows down translation operations, and despite the goals of the XML standard, tags are opaque and cumbersome for humans to deal with. He concludes that there must be a better way: the simple, delimited text files we’ve been using since the fifties. He offers an analogy with LaTeX and MathML:

Spoken languages are strengthened by usage, not by imperial fiat, and data formats are no different. Far better to evolve and adapt the standards we already have (as JSON and SQLite’s file format do), than to fabricate new ones from whole cloth.

XML offers some nice advantages for interoperability when managing “human-sized” documents, but when it comes to truly large datasets I have to agree these benefits are outweighed by the overheads detailed in Mike’s article.

Dataspora Blog: How XML Threatens Big Data

Link to original post

Spoken languages are strengthened by usage, not by imperial fiat, and
data formats are no different. Far better to evolve and adapt the
standards we already have (as JSON and SQLite’s file format do), than
to fabricate new ones from whole cloth.

Dataspora Blog: How XML Threatens Big Data

Link to original post