A Question of Scope

Don’t bite off more than you can chew because nobody looks attractive spitting it back out.

– Carroll Bryant

Don’t bite off more than you can chew because nobody looks attractive spitting it back out.

– Carroll Bryant

I apologize for the image – but the sentiment is apt. My mother would have said that “my eyes were bigger than my stomach” and admonished me for letting good food go to waste. The problem is that when the size of the project is beyond the budget constraints of the business, or the capacity of the development team, work goes unfinished. And so it’s essential that the scope of the effort be understood before any commitment is made to complete it.

The scope of a data management initiative is influenced by 3 factors:

  1. Reference Architecture
  2. Source Data Requirements
  3. Target Dimensional Breakdown

Reference Architecture

The reference architecture determines the number of areas that need to be modeled and the number of times the data will be migrated from one sector to another. Each sector also has its own degree of complexity – some considerably less than others.

  • What is the reference architecture of the target solution?
  • How many sectors of the reference architecture are targeted for this initiative?
  • Are we building on an existing structure or is this the initial project?
  • Do you have existing architecture, naming and content standards?
EDW Reference Architecture

EDW Reference Architecture

Source Data Requirements

The source data requirements are the list of fields that are to be brought into the target system. It is important to understand that a cost attaches to each one of these fields; and that if one field is being drawn from a source table, it will not be the same level of effort to bring in additional fields from the same table. Therefore, the scoping needs to be based on the number of individual pieces of information, rather than source tables.

  • How many distinct fields are required? (e.g., Customer First Name, Date of Birth, Industry Classification Code)
  • How many different source systems are involved?
  • How many different source tables are involved?
  • How many fields are being drawn from multiple sources? (e.g., Customer First Name coming from Marketing database and Point of Sale system)

Target Dimensional Breakdown

The target dimensional breakdown determines the breadth of subject areas being modeled, and the potential complexity of the processes involved.

The measures are numeric “counts and amounts” that either come directly from source systems, and so are relatively straightforward, or need to be calculated or derived from component values.

Obviously, these are more complex, and will involve the storage of both components and resulting calculations.

The dimensions give the measures context (e.g, sales by product, balances by branch). The number of different dimensions that need to be modeled and potentially mastered can have a significant impact on development time, particularly when they involve hierarchies.

  • What are the measures?
  • How many require calculations/derivations vs. coming directly from the sources?
  • What dimensions are being requested (customer, employee, store, branch etc.)?
  • What hierarchies are being requested?

Once you’ve established the scope of the project, you can estimate the time it will take to develop it. These are important inputs into understanding the order of magnitude of the task at hand; and an essential set of questions to be answered when communicating with the internal development team or a potential vendor.

We all know that scope can creep once the project gets going, and that becomes a management issue; but it’s important to start with a baseline understanding of the size of the task at hand. It helps set appropriate expectations for everyone involved. And it’s good for the digestion.