By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    Promising Benefits of Predictive Analytics in Asset Management
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: 24 Questions to Ask when Preparing Data for Analysis
Share
Notification Show More
Latest News
ai digital marketing tools
Top Five AI-Driven Digital Marketing Tools in 2023
Artificial Intelligence
ai-generated content
Is AI-Generated Content a Net Positive for Businesses?
Artificial Intelligence
predictive analytics in dropshipping
Predictive Analytics Helps New Dropshipping Businesses Thrive
Predictive Analytics
cloud data security in 2023
Top Tools for Your Cloud Data Security Stack in 2023
Cloud Computing
become a data scientist
Boosting Your Chances for Landing a Job as a Data Scientist
Jobs
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > 24 Questions to Ask when Preparing Data for Analysis
AnalyticsBig Data

24 Questions to Ask when Preparing Data for Analysis

Eran Levy
Last updated: 2016/09/29 at 1:36 PM
Eran Levy
9 Min Read
Image
SHARE

Image

Contents
Before you start: define the business questionsWhere is the data?Do you need to change the data?How will you connect the data?Do you need to further consolidate the data?How will you import the data?How will you verify the results?Start analyzing!

Image

Data preparation is perhaps the most important step in any type of serious data analysis. And while it would be ludicrous to attempt to cover such a broad field of knowledge in one article, we’ve prepared a quick checklist that you can run through when preparing data for analysis. Hopefully, this will help you optimize the data preparation process and make sure you have all the important steps and bases covered.

More Read

predictive analytics in dropshipping

Predictive Analytics Helps New Dropshipping Businesses Thrive

Utilizing Data to Discover Shortcomings Within Your Business Model
Small Businesses Use Big Data to Offset Risk During Economic Uncertainty
The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
Analytics Changes the Calculus of Business Tax Compliance

Before you start: define the business questions

We’ve written before about questions to ask during requirements elicitation, but as a general guideline – any type of data analysis starts by becoming familiar with the business questions you’ll want to answer and the KPIs you intend to measure.

A firm understanding of the business requirements will enable you to later map these demands back to the data and types of analyses you’ll want to perform, while failing to understand what the business expects to see can, in the beginning, can lead to lots of wasted time and effort later down the line – so don’t skip this step!

Once you have a firm grasp of what your business expects to see as the final product of the analysis, you’ll want to start diving into the data. The first thing you’ll want to do is to find it.

Where is the data?

The first set of questions refers to the physical locations in which your organization’s data is stored. For a small deployment, this could be as simple as a series of spreadsheets; for larger ones, you might be looking at multiple databases, Hadoop data lakes, cloud sources or a data warehouse (read about the differences between databases, date marts, and data warehouses here).

You will also need to find out whether you have the required permissions to access the data, and which types or formats of data you’ll be dealing with.

The questions you want to ask in this stage are:

  • Which data sources does my organization work with?
  • Do I have the required permissions or credentials to access the data?
  • What is the size of each dataset and how much data will I need to get from each one?
  • How familiar am I with the underlying tables and schema in each database?
  • Do I need all the data for more granular analysis, or do I need a subset to ensure faster performance?
  • Will the data need to be standardized due to disparity – e.g., by combining data from an SQL database with a NoSQL source such as MongoDB?
  • Will I need to analyze data from external sources, which resides outside of my organization’s data stores?

Do you need to change the data?

Often data needs to be manually transformed or manipulated for effective analysis. This could be relevant when various tables or datasets use different formats for the same information when the data is inconsistent or contains duplicate information, or when you want to group data in new ways.

Here’s what you’ll want to be asking:

  • For each individual source – is it complete? Accurate? Up to date?
  • In its current state, can I use the data to answer my business questions?
  • If there are inconsistencies or redundant values, what do I need to do to clean the data? Is it a matter of manually changing a few values or will a more systematic approach be necessary?
  • Will I be able to change the data in its original location, or would this need to be done in a secondary environment (e.g. cases where you do not have permissions to alter production data)?

How will you connect the data?

If you’re working with many different data sources and tables, you’ll need to model the data in a way that enables dashboard users to quickly receive answers to ad-hoc queries by connecting related fields in different tables. The relationship between the various entities in your data model will determine the types of queries your future analysis will be able to answer, as well as the efficiency in which it does so.

Start by asking:

  • Which fields are appropriate to connect data together, from a business viewpoint?
  • What relationship will occur once these fields are connected? You’ll want to avoid many-to-many relationships.
  • Will my data model scale?
  • How easy will it be to add data sources and make changes to the model further down the road?
  • Can we simplify the relationship without affecting performance? Note that this might depend on the data preparation and analytics tools that you’re using.

Do you need to further consolidate the data?

For certain types of more complex analyses, you might want to create new tables on top of your existing ones. One example of this can be a funnel analysis, in which you would want to take the basic information about an ongoing, multi-stage process and create various buckets into which each record would be categorized. Examples of questions that can help you understand whether you’re ready to go include:

  • Do I need to create summary tables for the types of analysis I want to perform?
  • Do I need to join data from the tables I’m working with an inner or outer join, or to combine these tables to create a new one?

How will you import the data?

While there are certain situations in which you would create reports and analysis by querying the production databases, most BI tools and implementations will rely on creating an amalgamation of the data in a secondary environment which will serve as your analytical database. The questions you want to ask include:

  • Does the local or cloud server I move my data to have the sufficient software and hardware to crunch the amounts of data I’m dealing with? The two are somewhat dependent, as the right software can reduce hardware costs.
  • At what frequency do I need to import the data? This depends on the rate at which the original data changes or grows.
  • How will importing the data affect my production environment?

How will you verify the results?

Before you can proudly announce that the data preparation is complete, you’ll want to make sure that the end result is accurate and that you haven’t made any mistakes along the way. To verify the data, ask questions such as:

  • Does is it make sense on a general level?
  • Are the measures I’m seeing in line with what I already know about the business?
  • Do calculations in my analytical environment return the same results as the same calculations performed manually on the original data?

Start analyzing!

After you’ve gone through the entire checklist above, you’ll have identified the data, transformed it, built your data model, moved the data to an analytical database and verified the results. This could be a matter of hours, days or more – depending on the amount of data you’re working with and its complexity.

If everything went well, you’re good to go – so go ahead and start building some dashboards! And read our guide to dashboard design to make sure you follow the core principles that will help you tell a clear and understandable story with your data.


 

Join us this Wednesday for our data prep webinar with the Aberdeen Group, or schedule a group demo to see how Sisense revolutionizes data preparation for complex datasets.

Eran Levy September 29, 2016
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai digital marketing tools
Top Five AI-Driven Digital Marketing Tools in 2023
Artificial Intelligence
ai-generated content
Is AI-Generated Content a Net Positive for Businesses?
Artificial Intelligence
predictive analytics in dropshipping
Predictive Analytics Helps New Dropshipping Businesses Thrive
Predictive Analytics
cloud data security in 2023
Top Tools for Your Cloud Data Security Stack in 2023
Cloud Computing

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

[mc4wp_form id=”1616″]

You Might also Like

predictive analytics in dropshipping
Predictive Analytics

Predictive Analytics Helps New Dropshipping Businesses Thrive

12 Min Read
utlizing big data for business model
Big Data

Utilizing Data to Discover Shortcomings Within Your Business Model

6 Min Read
big data use in small businesses
Big Data

Small Businesses Use Big Data to Offset Risk During Economic Uncertainty

7 Min Read
data-driven approach in healthcare
Analytics

The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?