Get an early start for on-time data modeling

July 23, 2011
41 Views

I’m a data modeler, so I enjoyed Jonathon Geiger’s recent article entitled “Why Does Data Modeling Take So Long”.  But why does he say it like it is a bad thing?

I’m a data modeler, so I enjoyed Jonathon Geiger’s recent article entitled “Why Does Data Modeling Take So Long”.  But why does he say it like it is a bad thing?

Mr. Geiger’s bottom line is exactly right: “Most of the time spent developing data models is consumed developing or clarifying the requirements and business rules and ensuring that the data structure can be populated by the existing data sources.”  On the projects he describes, no one took time before modeling to determine available data sources and identify business entities of interest, relationships among them, and attributes that describe them before database design started, so the data modeler had to do it.

Taking the second point first, we often think modeling takes a long time because we don’t recognize the need for conceptual data modeling in requirements. I’ve written that “using data modeling techniques in requirements analysis reduces errors by improving requirements completeness, consistency, and communication, and provides unique continuity between analysis and design.” The International Institute of Business Analysts (IIBA) must agree:  the Business Analysis Body of Knowledge (BABOK) lists data modeling among the tools available to requirements analysts.  Its purpose, according to the BABOK, is “to describe the concepts relevant to a domain, the relationships between those concepts, and information associated with them.”

For systems like data marts and warehouses that pull from existing source databases, investigation of current sources is a prerequisite of modeling.  Typically, some required data will not exist in source systems, and source data structures often contain inconsistencies and idiosyncrasies that modelers must understand before designing the database.  Mr. Geiger cites null values in a mandatory source field, a common problem in my experience.

However, there are two reasons this is good news rather than bad.

First, if data modelers take time to make up for missing analysis they can save the project. There is simply no way to design a satisfactory database without understanding business entities, relationships, and attributes, and the data that will feed the database. By taking time to figure these things out modelers not only design the right database but also positively influence the design of the application that uses the database.  Modeling schedule overruns can be time well spent.

Second, I’ve seen managers go through the dynamic that Mr. Geiger describes and learn to start data modeling earlier. These project planners learn from their experience and bring in the data folks early, front-loading their work in the requirements process.  I’ve found in those cases that data modeling substantially improves the quality of requirements, and as a result the chances of a successful project.

One final note: all this is still the case on an Agile BI effort.  Requirements may be less structured, and iteration scope is of course much smaller, but sources must be profiled and business entities, relationships, and attributes understood before successful database design.

You may be interested

IEEE Big Data Conference 2017 to Highlight Challenges, Opportunities
Big Data
65 shares967 views
Big Data
65 shares967 views

IEEE Big Data Conference 2017 to Highlight Challenges, Opportunities

Ryan Kade - June 23, 2017

Since 2013, the Institute of Electrical and Electronics Engineers has held annual big data conferences to highlight changes and opportunities…

10 of the Top Marketing BI Software Options
Business Intelligence
117 shares1,425 views
Business Intelligence
117 shares1,425 views

10 of the Top Marketing BI Software Options

Hayden B. - June 23, 2017

Business can be complicated sometimes. It’s not always easy to keep track of all the data and information we deal…

The Race for 5G Is the Race for Data Dominance
Big Data
80 shares1,120 views
Big Data
80 shares1,120 views

The Race for 5G Is the Race for Data Dominance

Daniel Matthews - June 22, 2017

Have you noticed how often the phrase “by the year 2020” comes up? In the tech sphere, many are heralding…