Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
    car expense data analytics
    Data Analytics for Smarter Vehicle Expense Management
    10 Min Read
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Predictive Analytics > The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions
AnalyticsPredictive Analytics

The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions

RichardBoire
RichardBoire
5 Min Read
Image
SHARE

ImageBuilding predictive analytics solutions is very much in-vogue for most organizations today. Historically, practitioners needed to educate businesses  on the value of data mining and predictive analytics. Now, the concept and value of predictive analytics is widely accepted by most businesses.

ImageBuilding predictive analytics solutions is very much in-vogue for most organizations today. Historically, practitioners needed to educate businesses  on the value of data mining and predictive analytics. Now, the concept and value of predictive analytics is widely accepted by most businesses. Demonstrating the value of solutions thru various techniques and approaches represents the exciting component of predictive analytics. Many consultants and businesses will discuss their experience with predictive analytics and how it solved a particular business problem.

Yet, minimal attention is given to the “unglamorous” side of predictive analytics which is the “Data”. A business problem is often posed and one slide is devoted to the sources of data that were used to develop the solution. No attention is given to the rigor in examining these data sources in order to arrive at an optimum data environment that can be used to develop the predictive analytics solution. We refer to this rigor as the “Data Audit Process”. Yet, it is this process and the discipline of being a “data grunt” which provides the backbone in building predictive analytics solutions. But what does the data audit entail.

Upon commencement of any predictive analytics solution, the data requirements and data sources are defined. A data extract document is written by the practitioner and the data is then delivered to the practitioner. If twenty data files or tables are requested, a separate data audit is done on  each file/table which ultimately results in the creation of twenty data audit reports. But what does the data audit report contain?  Three different types of output are created which ultimately yield a level of detailed insight about the data and how it can be used in an analytical solution.

More Read

Summary of NGMR Top Blogs 5 Hot 5 Not
Social Media: Back to Spreadsheets
How Is Mobile Technology Impacting the Food and Beverage Supply Chain?
Predictive Analytics Interview Series: Pasha Roberts at Talent Analytics
Silverlink update

The first output is a report depicting a random sample of 100 records from the file. This output is to simply provide us with a picture of what the actual table or file looks like.  From this sample, the practitioner can begin to better understand the composition of certain fields  based on the values and outcomes being reported in that field.

The second output is a data diagnostics report. This output looks at each field within a given file. The report outputs the field format, number of missing values and , the number of unique values for each field in the file. Along with these diagnostics, the report also outputs the mean value and standard deviation for each numeric field in the file. This output begins to reveal the utility of a variable in any predictive analytics solution. For example, variables with more than  90% of its values reported  as missing will not be useful in any analytics exercise.  Variab les that only have 1 unique outcome will also not be useful in any analytics solution.

The third output is the frequency distribution reports which are output for each variable in the file. These reports provide a more detailed view of the field or variable by displaying how the outcomes or values distribute  within a given field. Besides  yielding  additional information regarding what information or variables will be useful in a future  predictive analytics exercise, frequency reports also provide insights on how to derive new variab les from the source variables.

Although these  actual outputs themselves are not revolutionary, this discipline of “data” investigation represents the initial process in any analytics exercise.  It is this initial process that provides the framework in creating the all-important analytical file which will be used to develop the predictive model. Without this framework, it is akin to trying to read without understanding the alphabet. In the next blog, I will discuss what we need to consider in creating a robust analytical file once this  framework is established.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

intersection of data and patient care
How Healthcare Careers Are Expanding at the Intersection of Data and Patient Care
Big Data Exclusive
dedicated servers for ai businesses
5 Reasons AI-Driven Business Need Dedicated Servers
Artificial Intelligence Exclusive News
data analytics for pharmacy trends
How Data Analytics Is Tracking Trends in the Pharmacy Industry
Analytics Big Data Exclusive
ai call centers
Using Generative AI Call Center Solutions to Improve Agent Productivity
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Twitter Analytics: Words that make a difference

4 Min Read

Package Update Roundup: Feb 2009

3 Min Read

Using predictive analytics for fantasy football

2 Min Read

PASW 13 :The preview

3 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?