By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Doing Data Mining Out of Order
Share
Notification Show More
Latest News
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Doing Data Mining Out of Order
Data Mining

Doing Data Mining Out of Order

DeanAbbott
Last updated: 2011/01/22 at 4:41 PM
DeanAbbott
4 Min Read
SHARE

I like the CRISP-DM process model for data mining, teach from it, and use it on my projects. I commend it to practitioners and managers routinely as an aid during any data mining project. However, while the process sequence is generally the one I use, I don’t always; data mining often requires more creativity and “art” to re-work the data than we would like; it would be very nice if we could create a checklist and just run through the list on every project!

I like the CRISP-DM process model for data mining, teach from it, and use it on my projects. I commend it to practitioners and managers routinely as an aid during any data mining project. However, while the process sequence is generally the one I use, I don’t always; data mining often requires more creativity and “art” to re-work the data than we would like; it would be very nice if we could create a checklist and just run through the list on every project! But unfortunately data doesn’t always cooperate in this way, and we therefore need to adapt to the specific data problems so that the data is better prepared.

For example, on a current financial risk project I am working, the customer is building data for predictive analytics for the first time. The customer is data savvy, but new to predictive analytics, so we’ve had to iterate several times on how the data is pulled and rolled up out of the database. In particular, target variable has had to be cleaned up because of historic coding anomalies.

One primary question to resolve for this project is an all-too-common debate over what is the right level of aggregation: do we use transactional data even though some customers have many transactions and some have few, or do we roll data up to the customer level to build customer risk models. (A transaction-based model will score each transaction for risk, whereas a customer-based model will score, daily, the risk associated with each customer given the new transactions that have been added.) There are advantages and disadvantages to both, but in this case, we are building a customer-centric risk model for reasons that make sense in this particular business context.

More Read

data mining helps with offsite SEO

Can Data Mining Aid with Off-Page SEO Strategies?

3 Data Mining Tips for Companies Trying to Understand their Customers
5 Data Mining Tips to Leverage the Benefits of Surveys
Perform Data Mining With Web Scrapers to Track Prices
Data Mining Vital Statistics Yields Fascinating Societal Insights

Back to the CRISP-DM process and why it is advantageous to deviate from CRISP-DM. In this project, we jumped from Business Understanding and the beginnings of Data Understanding straight to Modeling. I think in this case, I would call it “modeling” (small ‘m’) because we weren’t building models to predict risk, but rather to understand the target variable better. We were not sure exactly how clean the data was to begin with, especially the definition of the target variable, because no one had ever looked at the data in aggregate before, only on a single customer-by-customer basis. By building models, and seeing some fields that predict the target variable “too well”, we have been able to identify historic data inconsistencies and miscoding.

Now that we have the target variable better defined, I’m going back to the data understanding and data prep stages to complete those stages properly, and this is changing how the data will be prepped in addition to modifying the definition of the target variable. It’s also much more enjoyable to build models than do data prep, so for me this was a “win-win” anyway!

DeanAbbott January 22, 2011
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data mining helps with offsite SEO
Data Mining

Can Data Mining Aid with Off-Page SEO Strategies?

10 Min Read
using data mining to learn more about customers
Big Data

3 Data Mining Tips for Companies Trying to Understand their Customers

6 Min Read
surveys data
Data Mining

5 Data Mining Tips to Leverage the Benefits of Surveys

11 Min Read
data mining is game changer for small businesses
Data Mining

Perform Data Mining With Web Scrapers to Track Prices

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?