Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
    financial analytics
    Financial Analytics Shows The Hidden Cost Of Not Switching Systems
    4 Min Read
    warehouse accidents
    Data Analytics and the Future of Warehouse Safety
    10 Min Read
    stock investing and data analytics
    How Data Analytics Supports Smarter Stock Trading Strategies
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Predictive Analytics > The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions
AnalyticsPredictive Analytics

The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions

RichardBoire
RichardBoire
5 Min Read
Image
SHARE

ImageBuilding predictive analytics solutions is very much in-vogue for most organizations today. Historically, practitioners needed to educate businesses  on the value of data mining and predictive analytics. Now, the concept and value of predictive analytics is widely accepted by most businesses.

ImageBuilding predictive analytics solutions is very much in-vogue for most organizations today. Historically, practitioners needed to educate businesses  on the value of data mining and predictive analytics. Now, the concept and value of predictive analytics is widely accepted by most businesses. Demonstrating the value of solutions thru various techniques and approaches represents the exciting component of predictive analytics. Many consultants and businesses will discuss their experience with predictive analytics and how it solved a particular business problem.

Yet, minimal attention is given to the “unglamorous” side of predictive analytics which is the “Data”. A business problem is often posed and one slide is devoted to the sources of data that were used to develop the solution. No attention is given to the rigor in examining these data sources in order to arrive at an optimum data environment that can be used to develop the predictive analytics solution. We refer to this rigor as the “Data Audit Process”. Yet, it is this process and the discipline of being a “data grunt” which provides the backbone in building predictive analytics solutions. But what does the data audit entail.

Upon commencement of any predictive analytics solution, the data requirements and data sources are defined. A data extract document is written by the practitioner and the data is then delivered to the practitioner. If twenty data files or tables are requested, a separate data audit is done on  each file/table which ultimately results in the creation of twenty data audit reports. But what does the data audit report contain?  Three different types of output are created which ultimately yield a level of detailed insight about the data and how it can be used in an analytical solution.

More Read

NYPD and Microsoft Create a Next Generation Law Enforcement Big Data Solution
Could Data Analytics Eliminate Imperfections in the Manufacturing Process?
The Stakeholders
Analytics Predictions 2013 by Alberto Roldan
Big Data Is The Next Frontier For Innovation, Competition and Productivity

The first output is a report depicting a random sample of 100 records from the file. This output is to simply provide us with a picture of what the actual table or file looks like.  From this sample, the practitioner can begin to better understand the composition of certain fields  based on the values and outcomes being reported in that field.

The second output is a data diagnostics report. This output looks at each field within a given file. The report outputs the field format, number of missing values and , the number of unique values for each field in the file. Along with these diagnostics, the report also outputs the mean value and standard deviation for each numeric field in the file. This output begins to reveal the utility of a variable in any predictive analytics solution. For example, variables with more than  90% of its values reported  as missing will not be useful in any analytics exercise.  Variab les that only have 1 unique outcome will also not be useful in any analytics solution.

The third output is the frequency distribution reports which are output for each variable in the file. These reports provide a more detailed view of the field or variable by displaying how the outcomes or values distribute  within a given field. Besides  yielding  additional information regarding what information or variables will be useful in a future  predictive analytics exercise, frequency reports also provide insights on how to derive new variab les from the source variables.

Although these  actual outputs themselves are not revolutionary, this discipline of “data” investigation represents the initial process in any analytics exercise.  It is this initial process that provides the framework in creating the all-important analytical file which will be used to develop the predictive model. Without this framework, it is akin to trying to read without understanding the alphabet. In the next blog, I will discuss what we need to consider in creating a robust analytical file once this  framework is established.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai in video game development
Machine Learning Is Changing iGaming Software Development
Exclusive Machine Learning News
media monitoring
Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
Analytics Exclusive Infographic
data=driven approach
Turning Dead Zones Into Data-Driven Opportunities In Retail Spaces
Big Data Exclusive Infographic
smarter manufacturing
Connecting the Factory Floor: Efficient Integration for Smarter Manufacturing
Infographic News

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

football data collection and analytics
Big Data

Unleashing Victory: How Data Collection Is Revolutionizing Football Performance Analysis!

4 Min Read

Wolfram Alpha Revisited

7 Min Read

Because it’s Friday: Breaking Up

0 Min Read

Big Data and Real-time Structured Data Analytics -…

3 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?