Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    warehouse accidents
    Data Analytics and the Future of Warehouse Safety
    10 Min Read
    stock investing and data analytics
    How Data Analytics Supports Smarter Stock Trading Strategies
    4 Min Read
    predictive analytics risk management
    How Predictive Analytics Is Redefining Risk Management Across Industries
    7 Min Read
    data analytics and gold trading
    Data Analytics and the New Era of Gold Trading
    9 Min Read
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Doing Data Mining Out of Order
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Doing Data Mining Out of Order
Data Mining

Doing Data Mining Out of Order

DeanAbbott
DeanAbbott
4 Min Read
SHARE

I like the CRISP-DM process model for data mining, teach from it, and use it on my projects. I commend it to practitioners and managers routinely as an aid during any data mining project. However, while the process sequence is generally the one I use, I don’t always; data mining often requires more creativity and “art” to re-work the data than we would like; it would be very nice if we could create a checklist and just run through the list on every project!

I like the CRISP-DM process model for data mining, teach from it, and use it on my projects. I commend it to practitioners and managers routinely as an aid during any data mining project. However, while the process sequence is generally the one I use, I don’t always; data mining often requires more creativity and “art” to re-work the data than we would like; it would be very nice if we could create a checklist and just run through the list on every project! But unfortunately data doesn’t always cooperate in this way, and we therefore need to adapt to the specific data problems so that the data is better prepared.

For example, on a current financial risk project I am working, the customer is building data for predictive analytics for the first time. The customer is data savvy, but new to predictive analytics, so we’ve had to iterate several times on how the data is pulled and rolled up out of the database. In particular, target variable has had to be cleaned up because of historic coding anomalies.

One primary question to resolve for this project is an all-too-common debate over what is the right level of aggregation: do we use transactional data even though some customers have many transactions and some have few, or do we roll data up to the customer level to build customer risk models. (A transaction-based model will score each transaction for risk, whereas a customer-based model will score, daily, the risk associated with each customer given the new transactions that have been added.) There are advantages and disadvantages to both, but in this case, we are building a customer-centric risk model for reasons that make sense in this particular business context.

More Read

Consolidation in the Social Business Market Continues: Salesforce.com Acquires Radian6
Technorati Blog Search:Tags / predictive analytics
Some cities are examining the possibility of installing data…
AT&T’s service, called FamilyMaps, allows people to…
Capturing Knowledge, and Making it ‘Findable’ (2 of 4)

Back to the CRISP-DM process and why it is advantageous to deviate from CRISP-DM. In this project, we jumped from Business Understanding and the beginnings of Data Understanding straight to Modeling. I think in this case, I would call it “modeling” (small ‘m’) because we weren’t building models to predict risk, but rather to understand the target variable better. We were not sure exactly how clean the data was to begin with, especially the definition of the target variable, because no one had ever looked at the data in aggregate before, only on a single customer-by-customer basis. By building models, and seeing some fields that predict the target variable “too well”, we have been able to identify historic data inconsistencies and miscoding.

Now that we have the target variable better defined, I’m going back to the data understanding and data prep stages to complete those stages properly, and this is changing how the data will be prepped in addition to modifying the definition of the target variable. It’s also much more enjoyable to build models than do data prep, so for me this was a “win-win” anyway!

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

Diverse Research Datasets
The 5 Best Platforms Offering the Most Diverse Research Datasets in 2026
Big Data Exclusive
macro intelligence and ai
How Permutable AI is Advancing Macro Intelligence for Complex Global Markets
Artificial Intelligence Exclusive
warehouse accidents
Data Analytics and the Future of Warehouse Safety
Analytics Commentary Exclusive
stock investing and data analytics
How Data Analytics Supports Smarter Stock Trading Strategies
Analytics Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

data science and data mining differences
Data Science

Deciphering The Seldom Discussed Differences Between Data Mining and Data Science

8 Min Read

MasterCard Applies Big Data to Help Retailers Achieve Better Results

6 Min Read

#13: Here’s a thought…

6 Min Read

IBM and ILOG for a smarter system

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?