Data Warehousing: Lessons We Have Failed to Learn
Over the last few years, I have been studying the reason that data warehouse projects fail. This, to me, is an interesting yet baffling subject – especially in this day and age – since we possess the tools, methods and skills to deliver them quickly and successfully, but so many still manage to fail.
In fact, mention ‘data warehouse’ to any experienced IT manager, and they think of a big, risky, expensive project that is likely to fail.
I still hear of projects where project managers and their developers defiantly ignore the warning signs and, like lemmings, leap headlong over the cliff of BI project failure.
Worse, we are seeing the advent of Big Data like it is going to be some huge saviour to the information management world. Frankly, I see it as a bigger disaster waiting to happen. With 80% of data warehouse projects failing, we clearly haven’t learned to walk, and yet now we want to run?
If we are still struggling to deliver everyday data warehouse projects, what hope do we have in succeeding with Big Data? Not only that, but with the race on to deliver mobile BI solutions, as well as Cloud-based BI solutions, we seriously need to get our house in order!
So I’ve gathered a few facts for you to ponder. Perhaps these may help you to ‘join the dots’ in understanding why data warehouse projects are notorious for going off the rails. Some of these points may raise ire with some, but that’s the point. There is a lot of blatant stubbornness among data warehouse project managers who continue to use out-dated methods to build BI solutions, and wonder why they still fail. They cling on to old-fashioned methods and toolsets like their lives depended on them, and would rather see a project fail than admit they might be wrong.
Remember, insanity is doing the same thing over and over again but expecting different results.
The single measure of success for any BI project. Are the users using it? If not, it has failed.
Users Don’t Know What They Don’t Know.
It is utterly pointless paying a Business Analyst to spend weeks asking users what they want from a BI project. Users don’t know - and will NEVER know – for absolute certain UNTIL they see something. What does that mean? It means that waterfall/SDLC as a methodology will never be appropriate for developing BI solutions. If you are utilising a Gantt chart for managing your BI project right now, you are heading for failure! It is become more widely known that Agile or Scrum methodologies work best for BI. Incremental, iterative steps are the way to go.
All BI Solutions Will Require Change.
Whether change comes from external or internal influences, it does not matter. It is inevitable. If your toolset/method/skills cannot embrace change, you are going to fail. If your ETL processes are like plates of spaghetti then change is not going to be easy for you. A data warehouse is a journey, not a destination, and often you will need to change direction.
Everybody Loves Kimball.
And why not? A star-schema or dimensional model, after all, is the single goal of any BI developer. A single fact table with some nice dimensions around it is Nirvana for an end-user. They’re easy to understand. It’s the goal for self-serve reporting. What can go wrong? Everything! The point is you can rarely go from source to star-schema without having to do SOMETHING to the data along the way. Especially if your fact table requires information from more than table or data source, then you face a lot of hard work to get that data into your star-schema. In a poll I conducted on LinkedIn a while back, I asked BI experts how much of a BI project was spent in just getting the data right. The answer came back as around 75-80%. Over three-quarters of any BI project is spent in just getting the data in a shape for BI! So when (not if) you are going to need to make changes, you will have a tougher job on your hands (especially if you built it using traditional ETL tools).
Everybody Ignores Inmon and Linstedt.
Bill Inmon, the Father of Data Warehousing, has written numerous books on the subject. His ‘Corporate Information Factory’ philosophy makes a lot of sense – take your data from all your sources and compose it into its Third Normal Form (3NF) in your Enterprise Data Warehouse (EDW). Why? It makes your data ‘generic’ and in its lowest level of granularity. Once you have your data in this state, it makes the perfect source for your Kimball star-schemas. Dan Linstedt extends on the 3NF model by introducing Data Vault, which provides a means of maintaining historical information about your data and where it came from. A unique feature of Data Vault is that you can understand what state your data warehouse was in at any point in time. So why does everybody ignore Inmon and Linstedt? Most likely because they are too complex to build and maintain using traditional ETL tools. Instead most developers will manage all the staging and data transformation in ancillary tables in a way only they understand using their favourite ETL tool. Good luck for when they finally leave your organisation!
ETL Tools Do Not Build Data Warehouses.
ETL tools were designed to move data from one place to another. Over the years, extra bits may have been cobbled on to perform tasks to ease the job of a data warehouse developer, but they still rely on too many other things. A decent target data warehouse model, for example. As discussed in the above points, ETL tools offer no help in providing a fast and effective means for delivering a design like a Third Normal Form or Data Vault Enterprise Data Warehouse. This means you need a data modelling tool plus the skills to design such architectures. Thankfully, we live in the 21st century where true data warehouse automation tools are emerging. These will help lead data warehousing out of the dark ages – especially with the advent of Big Data. Inmon and Linstedt have written the rules, now let the data warehouse automation tools take over!
To succeed in your data warehouse project, take an approach that embraces rapid change, and surround yourself with the tools, methods and people that are willing and able to support that.
While you need a star-schema for reporting and analytics, don’t try to take shortcuts to get there. You cannot go from source to star schema without having to something in between. Bill Inmon and Dan Linstedt have described how that ‘in-between’ bit should look. Ignore them at your peril! If you have multiple data sources, then DO look at building a 3NF or Data Vault EDW. To help you do that, look at getting a true Data Warehouse Automation tool.
If we are to succeed with Big Data, we need to be truly successful in data warehousing.
(data warehouse / shutterstock)
Attunity Compose (formerly known as BIReady) is a leading Data Warehouse Automation tool that actually designs and builds your data warehouse by analysing your source system(s)' data model and then generates a target DW model and all the ETL code to populate it. With Attunity Compose, you ...
Other Posts by Ian Nicholson
The moderated business community for business intelligence, predictive analytics, and data professionals.