BI Case Study: Building an Open Data Portal
This article was written by Paul Murphy, IT Program Leader at Co-operatives UK, about his experience using business intelligence technology to centralize years of messy historical data and create an open, explorable web portal. Do you have an interesting business intelligence case study to share? Email us at [email protected].
About one year ago today we decided it’s time to take statistical data that our organization has been gathering for over a century, and share it with the public through an open web portal. The results have been resoundingly successful, which is why I would like to share our story and demonstrate how open data can help you promote your organization and its goals.
Why Open Data?
Co-operatives UK – the network for Britain’s thousands of co-operative businesses, and a co-op itself – has, for over 100 years produced a variety of statistics on the co-operative sector. Behind these statistics is a large repository of data that we maintain on all co-operatives in the UK and their economic and social impact. However, until recently this valuable and unique source of intelligence was kept internal and not accessible to either our members or the public. Following a comprehensive review of our data strategy, a large data clean-up and investment in new systems, including BI analytics, we decided that we should start opening our data to the public.
Many commercial enterprises would balk at the idea of sharing their data with the public. The tendency, particularly in private companies, is to treat data as either a well-kept trade secret, a private business asset, or some combination of both. However, as we subscribe to the same principles of openness, education and information that all co-ops subscribe to, for us it was a natural thing to do.
Co-operatives UK (screenshot from web)
Our aim was to enable co-ops to better work together by providing more information about which co-ops operate in which sectors of the economy and where they are. We also wanted to make it easier for researchers examining the social impact that co-operatives have on communities to get access to the data they need. Co-operatives have always been about improving society, so when we say that co-operatives do more to tackle gender inequality or that co-operatives are helping tackling tax avoidance, our Open Data initiative provides the evidence to back up these claims and allows researchers to independently verify that our stats are more than just PR.
However, this was easier said than done, considering…
You Need Clean Data
If you’re going to make data publicly available, you’ll obviously need to provide it in a form that people can actually understand. Giving people access to mounds of raw data, with no governing principle, no single format and no way to easily explore is not likely to yield great results. In fact, this probably applies to any business intelligence initiative where you want people who aren’t IT professionals or data scientists to use data.
To give you an idea of the complexity of the data we were dealing with in our case, you need to understand that Co-operatives come in many shapes and sizes: from sole trader, to charity, to PLC. This means there’s no single regulator for co-operative businesses and no single governed dataset detailing them. This in turn means that we had to aggregate data from dozens of sources, including regulators and government departments, company websites, and our own knowledge and research. As our data strategy evolved and capabilities grew, we learned to employ much more sophisticated data gathering techniques – such as automatic ingestion through APIs and screen scraping.
We didn’t even know what data we had, let alone how to improve it. There were no schemata, lists or maps, just lots of disparate spreadsheets, systems and even post-it notes containing data. There was massive duplication, lack of shared understanding across teams and data quality was poor. We obviously needed to get the data to a state where we ourselves could make sense of it before making it publicly available.
Improving Data Integrity is an Ongoing and often Arduous Process
While not the primary focus of this article, this would be a good place to offer a word of advice: the amount of work you’ll need to put in to improve data quality depends on your organization and your data, but it’s as much about changing culture as it is about changing technology.
You’ll need to spend time and effort getting cross-organisation agreement on the meaning of data, adoption of new systems and changing ways of working with and thinking about data. For us, the challenge was understanding the data assets we had, implementing an organisation-wide data structure and introducing completely new systems and processes. With systems and structures in place, we then set about the huge task of cleaning and rebasing our legacy data to increase its accuracy and coverage.
However, it’s all worth it: Today, in addition to the Open Data Initiative, our organisation’s strategy is much more informed by data now and staff monitor its implementation with real-time dashboards. Decisions are more evidence-based. We’ve moved our data into Salesforce, which is our primary data platform, and maintain a few additional bespoke data systems, all MySQL based. Thankfully, the days of hundreds of spreadsheets of data are (mostly) behind us!
Launching the Web Portal
As in many data analytics projects, preparing the data was the hard part. After we made sense of our years of historical data, making it public was a pretty simple affair, and produced some exciting results.
We give anyone interested the option to download the data in CSV format through our website. But while CSV files are great for researchers and data geeks like me, in order to engage our staff, members, the rest of the co-operative movement and the media, we needed something more visual.
Ready-made infographics are one way that we present data for different audiences, and the interactive data visualization also played a crucial role: As part of the Open Data initiative we launched the Co-operative Data Explorer, which gives anyone visiting our website access to the data via interactive dashboards and data visualizations. People can filter the data or drill into specific time periods, or even specific co-ops, instantly and right on our website, in the way that is the most meaningful for them. This part of the project was fairly easy since we basically used a white-labeled version of Sisense for these embedded dashboards.
Open Data Drives Trust, Traffic and Media
Our Open Data initiative is less than a year old, but is already creating heaps of value for us. If previously the data was mostly ‘collecting dust’ in a mess of disparate systems, today it is used on a daily basis, and staff and members alike have that much more confidence in the statistics, research and media work we do.
Open Data has enabled us to collaborate around data much more. A group of interested people have got together to explore how to further promote this collaboration (http://www.p6data.coop). We have held 2 hack-days so far with them, with some great ideas and prototypes in development.
Media coverage of our Co-operative Economy report at the start of this month has been great, even with Brexit taking all the headlines. This infographic outlines some of the impact we’ve had just this month (click to enlarge):
Finally, looking at the web metrics for the data section of our website, including the dashboards, we can see visitors are coming in and are engaged – spending 3:27 minutes per page on average, which shows they are genuinely interested rather than just passing through.
About the Author
Paul Murphy is a data scientist and the IT Program Leader at Co-operatives UK, where he leads the modernization and transformation of the organization’s IT infrastructure and business intelligence processes.