Ever wonder how you can create meaningful insights with your data? Tired of being asked what's your big data strategy, are you doing predictive analytics, or how you are making use of the latest technologies? There is a way for you to get what you need with Data Science. The key to new knowledge is to make sure you use Data Science in your organization.


By the end of this article, you will have a straightforward strategy to deal with the data to get you the sort of meaningful insights you deserve. This article will help you develop your thinking beyond just databases, reports, and rules engines.


If you’re a business professional, you may be angry because at your core you know you cannot simply run a business on 30-day lag reports, but your business has only ever provided you more and more spreadsheets.

If you’re a data analyst, you probably want to move your organization past the trend-lines and frequency charts your reports deliver. After all, there is only so much one person can do with MS Excel and you no doubt have questions that your current methods simply don't make use of.

If you’re a data scientist, you are most likely making mistakes you are unaware of. Maybe not in the science component, or even the data component, but most likely the organization component. In my judgment, this is the toughest career on the planet right now, primarily because you know what's possible and you have to do better communicating with other non-tech types who have little idea what's possible. There is a day in the not too distant future where upper management will be sharing stories with their associates, "Yeah - I had to let our Data Scientist go. He never really persuaded me what Data Science could do."


I am going to show you a simplified three-step workflow that any organization should adopt immediately if they want meaningful insights.


While I like to geek out like the rest on things like machine learning, nested cross-validation, and linear algebra... in an organizational setting this sort of stuff simply isn't appropriate:

However, what I do know is the steps below are the right first steps to maximize organizational value from a Data Science point-of-view even though they don't appear to have the whiz-bang that the fancy math does:

  1. Generating the Right Questions
  2. Understanding the Data
  3. Producing Answers


Step 1: Generating the Right Question

Intuitively you probably would agree that all questions are not created equal, even within the same topic. Most organizations suffer from a case of the “rote-s”. Rote questions that have been passed down from year after year, predecessor after predecessor – all in the name of “that’s how we do it”. The problem is the margins erode, technology is hard to keep up with, and competition is global.

For fastest traction in your organization, you should start with the reports you pull today. What are the top 3 reports that create the most important impact on your business?

For example, you may have a report that measures how many widgets you sold in the last 30-days.

  • Great report…but is this the best question to ask with this data?
  • Let me clarify what I mean – would it be easier to drive for 30-days looking through the rear-view mirror or through the windshield of the car?
  • Take your 30-day widget report and modify the question from “What did I sell?” to “What will I sell?”


Don't miss what I am communicating here. The historical report obviously has value, but augment your question set to include what might happen now and in the future. Just to be clear by "predict what might happen" I am not talking about forecasting (or the idea of simply extending the trend by a few periods).

Step 2: Understanding the Data

To make sure new found question will generate the sort of meaningful insight you are looking to generate for your organization, you have to take the next step – validation. You need to answer the question do you have access or could you get access to the data that will help you generate the answer to your question.

Don’t believe it is that easy – let’s return to our widget example.

From a Data Science point of view, the question of “what will I likely sell in the next 30-days” is simple. Think about this you have all the historical transactions which take into all the typical questions a business thinks to ask. Things like seasonality, pricing changes, broad market factors – they are all baked in. However, in truth, there are things baked in that a business wouldn’t know to ask – even the most sophisticated business wouldn't know to ask. You should confirm for yourself that you get the transaction detail, not the summary detail to take you to step 3. If you can’t get this data in this format, you can’t get to step 3.

Recall, your organization may, in fact, be used to supplying summary data to business units, not typically transaction detail. When you ask for the transaction detail of what sold over the last 30-days – from a data perspective, get everything you can. Watch out for traps like, "just tell me what data you need, specifically." Tell the supplier of your data you want all the data related to the transactions, everything. If you don’t know what you have to work with you can’t get to work on step 3.

Step 3: Producing Answers

If you have made it to this step you will quickly understand that finding answers that will drive your business in meaningful directions is the easy part (I know what I just wrote, but it is true). Data Scientists use a method of providing answers called “the learning model”. So you might be thinking I am stating the obvious at this point, but here is a little-known fact. In Data Science, we use the prior outcomes to predict likely future outcomes with learning models. While there is a bit more to it than “point-and-shoot and big insights will appear”, the idea is to ingest the data into a code pipeline (fancy talk for a series of computational steps) and generate insights. From here not only will you be able to predict the probability of sales by whatever period duration your organization needs, but you can quickly determine which variables in the data drive those outcomes.