Data Modeling with Generalizations – The Tool Issue

A bunch of factors have converged lately on the topic of generalized versus specific data modeling approaches. I’m working through the topic with two clients and yesterday I attended a webinar by Len Silverston and Paul Agnew about Universal Patterns for Data Modeling. Then Paul posted to dm-discuss about performance issues with generalizations. I posted a couple of responses:

As I talk about in one of my presentations on Managing Codes and Reference Data* Mistakes, the biggest hurdle to working with generalized structures is that our tools (data modeling, database, enterprise architecture, etc.) have not caught up with this more modern method of modeling. They are all designed to manage requirements that are specifically modeled. Once we move a concept from an entity-attribute to an instance of an entity, we have no place to create specifications about that instance.
So often what typically happens is that this is left to developers to figure out. And their tools aren’t any better at handling these generalizations. What used to be drag-and-drop query creation is now hand coding. DBAs can’t tune the structures as easily because they don’t have any insight as to what the . …

As I talk about in one of my presentations on Managing Codes and Reference Data* Mistakes, the biggest hurdle to working with generalized structures is that our tools (data modeling, database, enterprise architecture, etc.) have not caught up with this more modern method of modeling. They are all designed to manage requirements that are specifically modeled. Once we move a concept from an entity-attribute to an instance of an entity, we have no place to create specifications about that instance.
So often what typically happens is that this is left to developers to figure out. And their tools aren’t any better at handling these generalizations. What used to be drag-and-drop query creation is now hand coding. DBAs can’t tune the structures as easily because they don’t have any insight as to what the data is going to be until real world test data is created or real world data is populated in the tables.
As data architects we can do up some sample/worked data examples in a spreadsheet, but there is no mechanism to manage those worked examples in our data models or to link those specifications together. Yes, some tools allow for enumerations to be managed, but these features don’t support the real world complexity need to show how this sample data is related to other data.
So we have two things that make it more difficult for DBAs and developers to work with generalized structures: Tools that don’t support it well (if at all) and data architects who fail to architect the data that has been generalized out of tables and columns and into row instances. On my projects, architects are required to prepare and manage (read that as “architect”) data instances as well as structures. On projects where this doesn’t happen, the generalized structures are often implemented incorrectly.
None of these problems are insurmountable. They are just challenges that we need to rise above.

* in my original post to dm-discuss, I referenced a different presentation, but it is the Managing Reference Data and Codes presentation where I covered this content.