Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    warehouse accidents
    Data Analytics and the Future of Warehouse Safety
    10 Min Read
    stock investing and data analytics
    How Data Analytics Supports Smarter Stock Trading Strategies
    4 Min Read
    predictive analytics risk management
    How Predictive Analytics Is Redefining Risk Management Across Industries
    7 Min Read
    data analytics and gold trading
    Data Analytics and the New Era of Gold Trading
    9 Min Read
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Doing Data Mining Out of Order
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Doing Data Mining Out of Order
Data Mining

Doing Data Mining Out of Order

DeanAbbott
DeanAbbott
4 Min Read
SHARE

I like the CRISP-DM process model for data mining, teach from it, and use it on my projects. I commend it to practitioners and managers routinely as an aid during any data mining project. However, while the process sequence is generally the one I use, I don’t always; data mining often requires more creativity and “art” to re-work the data than we would like; it would be very nice if we could create a checklist and just run through the list on every project!

I like the CRISP-DM process model for data mining, teach from it, and use it on my projects. I commend it to practitioners and managers routinely as an aid during any data mining project. However, while the process sequence is generally the one I use, I don’t always; data mining often requires more creativity and “art” to re-work the data than we would like; it would be very nice if we could create a checklist and just run through the list on every project! But unfortunately data doesn’t always cooperate in this way, and we therefore need to adapt to the specific data problems so that the data is better prepared.

For example, on a current financial risk project I am working, the customer is building data for predictive analytics for the first time. The customer is data savvy, but new to predictive analytics, so we’ve had to iterate several times on how the data is pulled and rolled up out of the database. In particular, target variable has had to be cleaned up because of historic coding anomalies.

One primary question to resolve for this project is an all-too-common debate over what is the right level of aggregation: do we use transactional data even though some customers have many transactions and some have few, or do we roll data up to the customer level to build customer risk models. (A transaction-based model will score each transaction for risk, whereas a customer-based model will score, daily, the risk associated with each customer given the new transactions that have been added.) There are advantages and disadvantages to both, but in this case, we are building a customer-centric risk model for reasons that make sense in this particular business context.

More Read

Image
B2B Software Startups: The SaaS Platform Dilemma
Department of State’s Consular Systems and Technology: A Track Record of Innovation
To Hell with Business Intelligence, try Decision Management.
Privacy in Behavioral Targeting: Poll Results
A demo of Swype, a new innovation in text entry that recently…

Back to the CRISP-DM process and why it is advantageous to deviate from CRISP-DM. In this project, we jumped from Business Understanding and the beginnings of Data Understanding straight to Modeling. I think in this case, I would call it “modeling” (small ‘m’) because we weren’t building models to predict risk, but rather to understand the target variable better. We were not sure exactly how clean the data was to begin with, especially the definition of the target variable, because no one had ever looked at the data in aggregate before, only on a single customer-by-customer basis. By building models, and seeing some fields that predict the target variable “too well”, we have been able to identify historic data inconsistencies and miscoding.

Now that we have the target variable better defined, I’m going back to the data understanding and data prep stages to complete those stages properly, and this is changing how the data will be prepped in addition to modifying the definition of the target variable. It’s also much more enjoyable to build models than do data prep, so for me this was a “win-win” anyway!

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai and satelite technology
How Machine Learning Improves Satellite Object Tracking
Exclusive Machine Learning
Diverse Research Datasets
The 5 Best Platforms Offering the Most Diverse Research Datasets in 2026
Big Data Exclusive
macro intelligence and ai
How Permutable AI is Advancing Macro Intelligence for Complex Global Markets
Artificial Intelligence Exclusive
warehouse accidents
Data Analytics and the Future of Warehouse Safety
Analytics Commentary Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Safeguarding Patient Data in EHRs

5 Min Read

Twitter Followers, Should We Have a New Metric?

4 Min Read

4 Best Practices for Sharing Workforce Data: Publishing Core Reports

5 Min Read

How to Use Pivot Tables to Mine Your Data

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?