Revisiting Data Warehouse Design

May 30, 2011
55 Views

The data warehouse has now been with us for a quarter of a century.  Its architecture and infrastructure have stood largely stable over that period.  A range of methodologies for designing and building data warehouses and data marts has evolved over the years.  And yet, time after time, in one project after another, one question is repeatedly asked: “why is it so difficult to accurately and reliably estimate the size and duration of data warehouse development projects?”

The data warehouse has now been with us for a quarter of a century.  Its architecture and infrastructure have stood largely stable over that period.  A range of methodologies for designing and building data warehouses and data marts has evolved over the years.  And yet, time after time, in one project after another, one question is repeatedly asked: “why is it so difficult to accurately and reliably estimate the size and duration of data warehouse development projects?”

On Friday, 20 May, WhereScape launched their new product WhereScape 3D at the Boulder BI Brain Trust (BBBT) meeting.  3D, standing for “Data Driven Design” is a novel and compelling approach to specifically supporting the design phase of data warehouse and data mart development projects and the data-focused experts whose skills and knowledge are vital to avoiding the sizing and scoping issues that frequently plague the development phase of these projects.

I provided a white paper for WhereScape as part of the launch.  This paper first explores the issues that plague data warehouse development projects and the most common trades-off made by vendors and developers–choosing between speed of delivery and consistency of information delivered.  The conclusion is simple.  This trade-off is increasingly unproductive.  Advances in business needs and technological functions demand delivery of data warehouses and marts with both speed and consistency.  And reliable estimates of project size and duration.

One compelling solution to these issues emerges from taking a new look at the process of designing and building data warehouses and marts from a very specific viewpoint–data and the specific skills needed to understand it.  From this, the paper surfaces the concept of data driven design and a number of key recommendations on how data warehouse design and population activities can be best structured for maximum accuracy and reliability in estimating project scope and schedule.

So, what is different about data driven design?  Briefly, it focuses on the planning phases of a data warehouse or data mart development project, before we bring in the ETL tool and the experts who build ETL.  This planning phase documents all that is known and can be discovered about the two key components of the development–the source data and the target model or database–at both a logical and physical level.  The reason for this focus is simple: if you know the most you can about these two components, you have the best chance of avoiding the development pitfalls so common in the development phase.

To me, that’s money in the bank of IT!  And my only question to WhereScape is: why are you offering it for free?  There’s no excuse for data warehouse project managers; go download it and try it out!