Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Predictive Analytics > The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions
AnalyticsPredictive Analytics

The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions

RichardBoire
RichardBoire
5 Min Read
Image
SHARE

ImageBuilding predictive analytics solutions is very much in-vogue for most organizations today. Historically, practitioners needed to educate businesses  on the value of data mining and predictive analytics. Now, the concept and value of predictive analytics is widely accepted by most businesses.

ImageBuilding predictive analytics solutions is very much in-vogue for most organizations today. Historically, practitioners needed to educate businesses  on the value of data mining and predictive analytics. Now, the concept and value of predictive analytics is widely accepted by most businesses. Demonstrating the value of solutions thru various techniques and approaches represents the exciting component of predictive analytics. Many consultants and businesses will discuss their experience with predictive analytics and how it solved a particular business problem.

Yet, minimal attention is given to the “unglamorous” side of predictive analytics which is the “Data”. A business problem is often posed and one slide is devoted to the sources of data that were used to develop the solution. No attention is given to the rigor in examining these data sources in order to arrive at an optimum data environment that can be used to develop the predictive analytics solution. We refer to this rigor as the “Data Audit Process”. Yet, it is this process and the discipline of being a “data grunt” which provides the backbone in building predictive analytics solutions. But what does the data audit entail.

Upon commencement of any predictive analytics solution, the data requirements and data sources are defined. A data extract document is written by the practitioner and the data is then delivered to the practitioner. If twenty data files or tables are requested, a separate data audit is done on  each file/table which ultimately results in the creation of twenty data audit reports. But what does the data audit report contain?  Three different types of output are created which ultimately yield a level of detailed insight about the data and how it can be used in an analytical solution.

More Read

Why Nobody Is Actually Analyzing Unstructured Data
Will Predictive Analytics Help Forecast Profitable IPOs for Stock Traders?
SAS Visual Analytics: How Catwoman Influenced My Data
Big Data Analytics: Think Differently To Maximize Value
Integrating BPM Software Into Your Data Strategy

The first output is a report depicting a random sample of 100 records from the file. This output is to simply provide us with a picture of what the actual table or file looks like.  From this sample, the practitioner can begin to better understand the composition of certain fields  based on the values and outcomes being reported in that field.

The second output is a data diagnostics report. This output looks at each field within a given file. The report outputs the field format, number of missing values and , the number of unique values for each field in the file. Along with these diagnostics, the report also outputs the mean value and standard deviation for each numeric field in the file. This output begins to reveal the utility of a variable in any predictive analytics solution. For example, variables with more than  90% of its values reported  as missing will not be useful in any analytics exercise.  Variab les that only have 1 unique outcome will also not be useful in any analytics solution.

The third output is the frequency distribution reports which are output for each variable in the file. These reports provide a more detailed view of the field or variable by displaying how the outcomes or values distribute  within a given field. Besides  yielding  additional information regarding what information or variables will be useful in a future  predictive analytics exercise, frequency reports also provide insights on how to derive new variab les from the source variables.

Although these  actual outputs themselves are not revolutionary, this discipline of “data” investigation represents the initial process in any analytics exercise.  It is this initial process that provides the framework in creating the all-important analytical file which will be used to develop the predictive model. Without this framework, it is akin to trying to read without understanding the alphabet. In the next blog, I will discuss what we need to consider in creating a robust analytical file once this  framework is established.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

Hidden AI, a risk?
Hidden AI, Real Risk: A Governance Roadmap For Mid-Market Organizations
Artificial Intelligence Exclusive Infographic
unusual trading activity
Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
Analytics Exclusive Infographic
Ai agents
AI Agent Trends Shaping Data-Driven Businesses
Artificial Intelligence Exclusive Infographic
Why Businesses Are Using Data to Rethink Office Operations
Why Businesses Are Using Data to Rethink Office Operations
Big Data Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

AT&T’s service, called FamilyMaps, allows people to…

1 Min Read

Winning the first game in a baseball series: a harbinger, or not?

4 Min Read
predictive analytics for emails
AnalyticsCRMPredictive Analytics

Predictive Analytics Methodologies Could Be The Secret To Great Emails

5 Min Read

Behind AmazonSupply, a Nuts-and-Bolts Service Oriented Architecture

3 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?