Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: 24 Questions to Ask when Preparing Data for Analysis
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > 24 Questions to Ask when Preparing Data for Analysis
AnalyticsBig Data

24 Questions to Ask when Preparing Data for Analysis

Eran Levy
Eran Levy
9 Min Read
Image
SHARE

Image

Contents
Before you start: define the business questionsWhere is the data?Do you need to change the data?How will you connect the data?Do you need to further consolidate the data?How will you import the data?How will you verify the results?Start analyzing!

Image

Data preparation is perhaps the most important step in any type of serious data analysis. And while it would be ludicrous to attempt to cover such a broad field of knowledge in one article, we’ve prepared a quick checklist that you can run through when preparing data for analysis. Hopefully, this will help you optimize the data preparation process and make sure you have all the important steps and bases covered.

More Read

Are New SEC Rules Enough to Prevent Another Flash Crash?
Dropbox or Box – Which Cloud Storage For Small Businesses
How BI and Data Analytics Professionals Used Twitter in February
Practical Ways to Protect Customer Information
The “Force-Field” of Data Governance

Before you start: define the business questions

We’ve written before about questions to ask during requirements elicitation, but as a general guideline – any type of data analysis starts by becoming familiar with the business questions you’ll want to answer and the KPIs you intend to measure.

A firm understanding of the business requirements will enable you to later map these demands back to the data and types of analyses you’ll want to perform, while failing to understand what the business expects to see can, in the beginning, can lead to lots of wasted time and effort later down the line – so don’t skip this step!

Once you have a firm grasp of what your business expects to see as the final product of the analysis, you’ll want to start diving into the data. The first thing you’ll want to do is to find it.

Where is the data?

The first set of questions refers to the physical locations in which your organization’s data is stored. For a small deployment, this could be as simple as a series of spreadsheets; for larger ones, you might be looking at multiple databases, Hadoop data lakes, cloud sources or a data warehouse (read about the differences between databases, date marts, and data warehouses here).

You will also need to find out whether you have the required permissions to access the data, and which types or formats of data you’ll be dealing with.

The questions you want to ask in this stage are:

  • Which data sources does my organization work with?
  • Do I have the required permissions or credentials to access the data?
  • What is the size of each dataset and how much data will I need to get from each one?
  • How familiar am I with the underlying tables and schema in each database?
  • Do I need all the data for more granular analysis, or do I need a subset to ensure faster performance?
  • Will the data need to be standardized due to disparity – e.g., by combining data from an SQL database with a NoSQL source such as MongoDB?
  • Will I need to analyze data from external sources, which resides outside of my organization’s data stores?

Do you need to change the data?

Often data needs to be manually transformed or manipulated for effective analysis. This could be relevant when various tables or datasets use different formats for the same information when the data is inconsistent or contains duplicate information, or when you want to group data in new ways.

Here’s what you’ll want to be asking:

  • For each individual source – is it complete? Accurate? Up to date?
  • In its current state, can I use the data to answer my business questions?
  • If there are inconsistencies or redundant values, what do I need to do to clean the data? Is it a matter of manually changing a few values or will a more systematic approach be necessary?
  • Will I be able to change the data in its original location, or would this need to be done in a secondary environment (e.g. cases where you do not have permissions to alter production data)?

How will you connect the data?

If you’re working with many different data sources and tables, you’ll need to model the data in a way that enables dashboard users to quickly receive answers to ad-hoc queries by connecting related fields in different tables. The relationship between the various entities in your data model will determine the types of queries your future analysis will be able to answer, as well as the efficiency in which it does so.

Start by asking:

  • Which fields are appropriate to connect data together, from a business viewpoint?
  • What relationship will occur once these fields are connected? You’ll want to avoid many-to-many relationships.
  • Will my data model scale?
  • How easy will it be to add data sources and make changes to the model further down the road?
  • Can we simplify the relationship without affecting performance? Note that this might depend on the data preparation and analytics tools that you’re using.

Do you need to further consolidate the data?

For certain types of more complex analyses, you might want to create new tables on top of your existing ones. One example of this can be a funnel analysis, in which you would want to take the basic information about an ongoing, multi-stage process and create various buckets into which each record would be categorized. Examples of questions that can help you understand whether you’re ready to go include:

  • Do I need to create summary tables for the types of analysis I want to perform?
  • Do I need to join data from the tables I’m working with an inner or outer join, or to combine these tables to create a new one?

How will you import the data?

While there are certain situations in which you would create reports and analysis by querying the production databases, most BI tools and implementations will rely on creating an amalgamation of the data in a secondary environment which will serve as your analytical database. The questions you want to ask include:

  • Does the local or cloud server I move my data to have the sufficient software and hardware to crunch the amounts of data I’m dealing with? The two are somewhat dependent, as the right software can reduce hardware costs.
  • At what frequency do I need to import the data? This depends on the rate at which the original data changes or grows.
  • How will importing the data affect my production environment?

How will you verify the results?

Before you can proudly announce that the data preparation is complete, you’ll want to make sure that the end result is accurate and that you haven’t made any mistakes along the way. To verify the data, ask questions such as:

  • Does is it make sense on a general level?
  • Are the measures I’m seeing in line with what I already know about the business?
  • Do calculations in my analytical environment return the same results as the same calculations performed manually on the original data?

Start analyzing!

After you’ve gone through the entire checklist above, you’ll have identified the data, transformed it, built your data model, moved the data to an analytical database and verified the results. This could be a matter of hours, days or more – depending on the amount of data you’re working with and its complexity.

If everything went well, you’re good to go – so go ahead and start building some dashboards! And read our guide to dashboard design to make sure you follow the core principles that will help you tell a clear and understandable story with your data.


 

Join us this Wednesday for our data prep webinar with the Aberdeen Group, or schedule a group demo to see how Sisense revolutionizes data preparation for complex datasets.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics and truck accident claims
How Data Analytics Reduces Truck Accidents and Speeds Up Claims
Analytics Big Data Exclusive
predictive analytics for interior designers
Interior Designers Boost Profits with Predictive Analytics
Analytics Exclusive Predictive Analytics
big data and cybercrime
Stopping Lateral Movement in a Data-Heavy, Edge-First World
Big Data Exclusive
AI and data mining
What the Rise of AI Web Scrapers Means for Data Teams
Artificial Intelligence Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

The Hegemony of Large Numbers – Ignoring Common Sense

4 Min Read
data analytics personal branding
Analytics

Can Data Analytics Help with Professional Branding?

8 Min Read

How Social Intelligence Can Identify Weak Points in Your Customer Journey and Drive Decision-Making

5 Min Read

BI/DW Index: Will tech lead us in the rebound?

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?