Big Data Flows vs. Wicked Leaks

I was invited to deliver a short keynote about “big data” at today’s OECD roundtable focused on the economics of personal data and privacy. My presentation here.

Most big data flows by design. But when big data leaks the consequences can be wicked.

I was invited to deliver a short keynote about “big data” at today’s OECD roundtable focused on the economics of personal data and privacy. My presentation here.

Most big data flows by design. But when big data leaks the consequences can be wicked.

That said … protecting big data from wicked leaks is not going to be easy. Defending against external cyber penetrations and insider threats are both hard problems.

Now with the Wikileaks disclosures it is clear the game has changed. Historically, public disclosure of classified data has been limited and infrequent – let’s even say, to a degree, tolerable. Contrast that with the scope of the recently leaked cables. The volume of the leakage so significant and intolerable; I will not be surprised if a number of governments around the world attempt to enact new, wide-sweeping, anti-leak legislation directed at not only those engaged in the initial theft of the data, but the distribution points (e.g., Wikileaks) and the publishers (e.g., the media)? The principle being if one knowingly receives and benefits from stolen property, they are accomplices. This pendulum could swing so far (backwards) the future will have far fewer media leaks than the historical (tolerable) volumes – i.e., this whole fiasco resulting in less transparency and accountability.

BTW: I suppose it could have been worse. What if the 250,000 classified cables where selectively and quietly passed around to various foreign intelligence services? What if the US thought these were secrets? Imagine believing one has a certain security posture … when one does not. Would that be worse?

Organizations with big data worth protecting must employ extraordinary controls to reduce the risk of unintended disclosure. On that note, I closed with a few ideas related to protecting big data from wicked leaks, including:

Central indexes. There are actually a number of scenarios where a single, central catalog of pointers is better than lots of copies of the same data scattered all over the place – the advantage being fewer copies of the data and uniform access controls and audit logs.

Anonymization. Despite the imperfections of data anonymization, when it comes to reducing the risk of unintended disclosure, most would agree that data anonymization is still better than clear text.

Immutable audit logs. Tamper resistant audit logs can be used to help prove the users of the system are complying with law and policy.

Real-time active audits. It is now going to be essential that user activity be more rigorously analyzed, in real-time, for inappropriate behavior. Audit logs have actually been part of the problem – just another big pile of data – evidence of misuse hiding in plain sight against the backdrop of millions and millions of benign audit records.