Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: 24 Questions to Ask when Preparing Data for Analysis
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > 24 Questions to Ask when Preparing Data for Analysis
AnalyticsBig Data

24 Questions to Ask when Preparing Data for Analysis

Eran Levy
Eran Levy
9 Min Read
Image
SHARE

Image

Contents
Before you start: define the business questionsWhere is the data?Do you need to change the data?How will you connect the data?Do you need to further consolidate the data?How will you import the data?How will you verify the results?Start analyzing!

Image

Data preparation is perhaps the most important step in any type of serious data analysis. And while it would be ludicrous to attempt to cover such a broad field of knowledge in one article, we’ve prepared a quick checklist that you can run through when preparing data for analysis. Hopefully, this will help you optimize the data preparation process and make sure you have all the important steps and bases covered.

More Read

How to Improve Your Receivables Position With Better Risk Analysis
Big Data From Hurricane Irma
How the New Revenue Recognition Rules Could Impact Budgeting and Planning
It’s data, Jim, but not as we know it – Part 1: What the echo of the Big Bang tells us about the nature of information
The ABCs of Enterprise Analytics

Before you start: define the business questions

We’ve written before about questions to ask during requirements elicitation, but as a general guideline – any type of data analysis starts by becoming familiar with the business questions you’ll want to answer and the KPIs you intend to measure.

A firm understanding of the business requirements will enable you to later map these demands back to the data and types of analyses you’ll want to perform, while failing to understand what the business expects to see can, in the beginning, can lead to lots of wasted time and effort later down the line – so don’t skip this step!

Once you have a firm grasp of what your business expects to see as the final product of the analysis, you’ll want to start diving into the data. The first thing you’ll want to do is to find it.

Where is the data?

The first set of questions refers to the physical locations in which your organization’s data is stored. For a small deployment, this could be as simple as a series of spreadsheets; for larger ones, you might be looking at multiple databases, Hadoop data lakes, cloud sources or a data warehouse (read about the differences between databases, date marts, and data warehouses here).

You will also need to find out whether you have the required permissions to access the data, and which types or formats of data you’ll be dealing with.

The questions you want to ask in this stage are:

  • Which data sources does my organization work with?
  • Do I have the required permissions or credentials to access the data?
  • What is the size of each dataset and how much data will I need to get from each one?
  • How familiar am I with the underlying tables and schema in each database?
  • Do I need all the data for more granular analysis, or do I need a subset to ensure faster performance?
  • Will the data need to be standardized due to disparity – e.g., by combining data from an SQL database with a NoSQL source such as MongoDB?
  • Will I need to analyze data from external sources, which resides outside of my organization’s data stores?

Do you need to change the data?

Often data needs to be manually transformed or manipulated for effective analysis. This could be relevant when various tables or datasets use different formats for the same information when the data is inconsistent or contains duplicate information, or when you want to group data in new ways.

Here’s what you’ll want to be asking:

  • For each individual source – is it complete? Accurate? Up to date?
  • In its current state, can I use the data to answer my business questions?
  • If there are inconsistencies or redundant values, what do I need to do to clean the data? Is it a matter of manually changing a few values or will a more systematic approach be necessary?
  • Will I be able to change the data in its original location, or would this need to be done in a secondary environment (e.g. cases where you do not have permissions to alter production data)?

How will you connect the data?

If you’re working with many different data sources and tables, you’ll need to model the data in a way that enables dashboard users to quickly receive answers to ad-hoc queries by connecting related fields in different tables. The relationship between the various entities in your data model will determine the types of queries your future analysis will be able to answer, as well as the efficiency in which it does so.

Start by asking:

  • Which fields are appropriate to connect data together, from a business viewpoint?
  • What relationship will occur once these fields are connected? You’ll want to avoid many-to-many relationships.
  • Will my data model scale?
  • How easy will it be to add data sources and make changes to the model further down the road?
  • Can we simplify the relationship without affecting performance? Note that this might depend on the data preparation and analytics tools that you’re using.

Do you need to further consolidate the data?

For certain types of more complex analyses, you might want to create new tables on top of your existing ones. One example of this can be a funnel analysis, in which you would want to take the basic information about an ongoing, multi-stage process and create various buckets into which each record would be categorized. Examples of questions that can help you understand whether you’re ready to go include:

  • Do I need to create summary tables for the types of analysis I want to perform?
  • Do I need to join data from the tables I’m working with an inner or outer join, or to combine these tables to create a new one?

How will you import the data?

While there are certain situations in which you would create reports and analysis by querying the production databases, most BI tools and implementations will rely on creating an amalgamation of the data in a secondary environment which will serve as your analytical database. The questions you want to ask include:

  • Does the local or cloud server I move my data to have the sufficient software and hardware to crunch the amounts of data I’m dealing with? The two are somewhat dependent, as the right software can reduce hardware costs.
  • At what frequency do I need to import the data? This depends on the rate at which the original data changes or grows.
  • How will importing the data affect my production environment?

How will you verify the results?

Before you can proudly announce that the data preparation is complete, you’ll want to make sure that the end result is accurate and that you haven’t made any mistakes along the way. To verify the data, ask questions such as:

  • Does is it make sense on a general level?
  • Are the measures I’m seeing in line with what I already know about the business?
  • Do calculations in my analytical environment return the same results as the same calculations performed manually on the original data?

Start analyzing!

After you’ve gone through the entire checklist above, you’ll have identified the data, transformed it, built your data model, moved the data to an analytical database and verified the results. This could be a matter of hours, days or more – depending on the amount of data you’re working with and its complexity.

If everything went well, you’re good to go – so go ahead and start building some dashboards! And read our guide to dashboard design to make sure you follow the core principles that will help you tell a clear and understandable story with your data.


 

Join us this Wednesday for our data prep webinar with the Aberdeen Group, or schedule a group demo to see how Sisense revolutionizes data preparation for complex datasets.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Image
Best PracticesBig DataData ManagementTransparency

Verizon Flirts with Thin Data Rights Line

8 Min Read

The Dirichlet Process Part 1: Simplex

0 Min Read
Image
Big Data

Expanding Health Care Applications for Big Data

4 Min Read

Adventures in Data Profiling (Part 8)

14 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?