Public Information

November 5, 2009
104 Views

In a past life I co-founded and led a company specialised in the analysis of publicly available textual information. The basic idea I had was that if you could read every newspaper published everyday, and throw in other easily accessible information sources like company filings to Stock Exchanges and national regulators, then you would learn a lot about:

  • mergers and acquisitions
  • investment strategies
  • alliances and pilot projects using emerging information technologies.

The hunch was right and after 1 million lines of code and several million dollars, we had a system that could automatically extract meaning from news 24/7. We launched a subscription service that read thousands of articles each day and then visualised these activities. We sold many subscriptions to leading corporations and government agencies around the globe.Virgin Group

On the right is a print-out from that application. Everything was interactive and you ‘surfed’ through the ‘mind maps’ using a simple point and click interface. Here is a larger file that you can zoom into to explore: The Virgin Group.

In the course of this journey from startup to a real company with cash flow, we were presented with an interesting dilemma.




In a past life I co-founded and led a company specialised in the analysis of publicly available textual information. The basic idea I had was that if you could read every newspaper published everyday, and throw in other easily accessible information sources like company filings to Stock Exchanges and national regulators, then you would learn a lot about:

  • mergers and acquisitions
  • investment strategies
  • alliances and pilot projects using emerging information technologies.

The hunch was right and after 1 million lines of code and several million dollars, we had a system that could automatically extract meaning from news 24/7. We launched a subscription service that read thousands of articles each day and then visualised these activities. We sold many subscriptions to leading corporations and government agencies around the globe.Virgin Group

On the right is a print-out from that application. Everything was interactive and you ‘surfed’ through the ‘mind maps’ using a simple point and click interface. Here is a larger file that you can zoom into to explore: The Virgin Group.

In the course of this journey from startup to a real company with cash flow, we were presented with an interesting dilemma.

Early on in the history of the company we produced a detailed view of the personal investment strategy of the founder of one of the world’s largest software companies. We did it by analysing the web of legal entities he had established and then tracking share transactions, venture capital and M&A deals and news reports, etc.

The end result was very interesting and a printed copy of one of the main mind maps was lent to a Board Member of Philips (the Dutch electronics group). By coincidence, this same founder whose investment strategy we had analysed, turned up for a meeting and he saw the chart in their office. Apparently he hit the roof and thought that he was the target of some sort of corporate espionage.

This got us thinking and because I had lived in the States, I immediately contacted our trusty lawyer. Our legal council advised us that in the US we ran the risk of being sued for breaching privacy laws. This was despite the fact that all of our source information was freely available.

As a side note, the problem was solved by us agreeing not to locate our technology on servers in the US. We remained offshore and (presumably) were much harder for US citizen’s to reach us legally. It meant that we paid a little more for the data centre and internet traffic – but I think it was worth it.

Why am I writing this today? Well ever since then, I have had a keen interest in what public information is available and what you are allowed to do with it. So I read with interest the following article in Ars Technica:

Lobbyists beware: judge rules metadata is public record

The Arizona Supreme Court has ruled that the metadata attached to public records is itself a public record. Given the frequency with which metadata outs lobbyists’ and corporations’ efforts to mask their own contributions to public debates, this is a good thing.

Ars Technica, By Jon Stokes | October 29, 2009

The Arizona state Supreme Court has ruled that the metadata attached to public records is itself public, and cannot be withheld in response to a public records request. Such a ruling on file metadata may not seem like a huge win for open government advocates, but it definitely is, given that metadata has unmasked more than one lobbyist’s effort to influence Congress.

In the Arizona case, a police officer had been demoted in 2006 after reporting “serious police misconduct” to his superiors. He suspected that the demotion was done in retaliation for his blowing the whistle on his fellow officers, so he requested and obtained copies of his performance reports from the department. Thinking that perhaps the negative performance reports had been created after the fact and then backdated, he then demanded access to the file metadata for those reports, in order to find out who had written them and when.

The department refused to grant him access to the metadata, and the matter went to court. After working its way through the court system in a series of rulings and appeals, this past January an Arizona appeals ruled that even though the reports themselves were public records, the metadata was not. It turned out that Arizona state law doesn’t actually define “public record” anywhere, so the appeals court relied on various common law definitions to determine that the metadata, as a mere byproduct of the act of producing a public record on a computer, was not a public record itself.

The case was then appealed to the Arizona state Supreme Court, which has now ruled that the metadata is, in fact, a public record just like the document that it’s attached to.

Metadata follies, and the case of Google

If you want to know how important metadata can be in public policy deliberations, Google’s history with it can be instructive, since the search giant has been both hurt and helped by metadata snooping.

Last year, the The Australian Competition Commission and Consumer Commission (ACCC) received hundreds of electronically submitted feedback letters opposing eBay Australia’s decision to go PayPal-only for accepting auction payments. One of the most impressive letters was a 38-page missive that had obviously been written by someone with extensive and intimate knowledge of payment systems. A look at the letter’s PDF metadata revealed that the author of the letter was none other than Google, which was upset that Google Checkout was being excluded in favor of PayPal. The metadata also revealed, embarrassingly enough, that the PDF had been written not in Google Docs, but in Microsoft Word.

The very next month, the tables were turned when the American Corn Grower’s Association somewhat surprisingly threw its weight behind the idea that Congress should launch a hearing to look into the possible anti-trust implications of the Google-Yahoo advertising deal. CNET’s Declan McCullagh took a look at the PDF letter that the group submitted to Congress, and found that it had been authored by a staffer at the LawMedia Group, a DC lobbying shop whose client list includes the anti-Google, anti-net neutrality National Cable and Telecommunications Association.

To leave Google’s metadata mixups and go back even further in time, one of the most famous metadata lobbying goof-ups occurred in 2004, when Wired busted California Attorney General Bill Lockyer circulating an anti-P2P letter that, after a look at its Word metadata, appeared to have been either drafted or edited by the MPAA.

As open government projects that solicit feedback from the public gain traction at the federal and local level, these types of metadata-related discoveries will become more and more common. Guaranteeing that file metadata is available to the public will make help to ensure that we know who is trying to influence public discussion.

People complain (endlessly) about America – but we in Australia can only dream of having the public right to a tenth of the information made available in the US.

Does this disadvantage the practise of analytics in Australia? You bet – and we are a poorer nation for it.  

Now if I can just modify my code to automatically analyse the metadata of the datasets that the Australian Federal and State Governments are now releasing under FOI-like (Freedom Of Information) licenses maybe there are interesting things to be learnt.

Anyone interested? Malcolm??

Link to original post