Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Predictive Analytics > The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions
AnalyticsPredictive Analytics

The Data Audit Process: The Initial Step in Building Successful Predictive Analytics Solutions

RichardBoire
RichardBoire
5 Min Read
Image
SHARE

ImageBuilding predictive analytics solutions is very much in-vogue for most organizations today. Historically, practitioners needed to educate businesses  on the value of data mining and predictive analytics. Now, the concept and value of predictive analytics is widely accepted by most businesses.

ImageBuilding predictive analytics solutions is very much in-vogue for most organizations today. Historically, practitioners needed to educate businesses  on the value of data mining and predictive analytics. Now, the concept and value of predictive analytics is widely accepted by most businesses. Demonstrating the value of solutions thru various techniques and approaches represents the exciting component of predictive analytics. Many consultants and businesses will discuss their experience with predictive analytics and how it solved a particular business problem.

Yet, minimal attention is given to the “unglamorous” side of predictive analytics which is the “Data”. A business problem is often posed and one slide is devoted to the sources of data that were used to develop the solution. No attention is given to the rigor in examining these data sources in order to arrive at an optimum data environment that can be used to develop the predictive analytics solution. We refer to this rigor as the “Data Audit Process”. Yet, it is this process and the discipline of being a “data grunt” which provides the backbone in building predictive analytics solutions. But what does the data audit entail.

Upon commencement of any predictive analytics solution, the data requirements and data sources are defined. A data extract document is written by the practitioner and the data is then delivered to the practitioner. If twenty data files or tables are requested, a separate data audit is done on  each file/table which ultimately results in the creation of twenty data audit reports. But what does the data audit report contain?  Three different types of output are created which ultimately yield a level of detailed insight about the data and how it can be used in an analytical solution.

More Read

real-time location based data for healthcare industry
How Real-Time and Location Data Are Revolutionizing the Healthcare Industry
Web Lies, Damned Lies, and Statistics
Products + Social = Better Products
Big Data Empowers the InterContinental Hotel Group
Amazon: Using Big Data Analytics to Read Your Mind

The first output is a report depicting a random sample of 100 records from the file. This output is to simply provide us with a picture of what the actual table or file looks like.  From this sample, the practitioner can begin to better understand the composition of certain fields  based on the values and outcomes being reported in that field.

The second output is a data diagnostics report. This output looks at each field within a given file. The report outputs the field format, number of missing values and , the number of unique values for each field in the file. Along with these diagnostics, the report also outputs the mean value and standard deviation for each numeric field in the file. This output begins to reveal the utility of a variable in any predictive analytics solution. For example, variables with more than  90% of its values reported  as missing will not be useful in any analytics exercise.  Variab les that only have 1 unique outcome will also not be useful in any analytics solution.

The third output is the frequency distribution reports which are output for each variable in the file. These reports provide a more detailed view of the field or variable by displaying how the outcomes or values distribute  within a given field. Besides  yielding  additional information regarding what information or variables will be useful in a future  predictive analytics exercise, frequency reports also provide insights on how to derive new variab les from the source variables.

Although these  actual outputs themselves are not revolutionary, this discipline of “data” investigation represents the initial process in any analytics exercise.  It is this initial process that provides the framework in creating the all-important analytical file which will be used to develop the predictive model. Without this framework, it is akin to trying to read without understanding the alphabet. In the next blog, I will discuss what we need to consider in creating a robust analytical file once this  framework is established.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

Hidden AI, a risk?
Hidden AI, Real Risk: A Governance Roadmap For Mid-Market Organizations
Artificial Intelligence Exclusive Infographic
unusual trading activity
Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
Analytics Exclusive Infographic
Ai agents
AI Agent Trends Shaping Data-Driven Businesses
Artificial Intelligence Exclusive Infographic
Why Businesses Are Using Data to Rethink Office Operations
Why Businesses Are Using Data to Rethink Office Operations
Big Data Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Report from the 2012 Hadoop Summit

21 Min Read
big data in linkedin marketing
Analytics

Data-Driven LinkedIn Marketing Tips to Try In 2021

10 Min Read

R and the Next Big Thing

7 Min Read
analytical problem solving skills
AnalyticsBig DataExclusiveJobs

Here Are The Skills You Need To Work With Big Data

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?