Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    big data and customer service outsourcing
    How Data Analytics Improves Customer Service Outsourcing
    18 Min Read
    How a Specialized Marketing VA Improves Campaign Analytics
    How a Specialized Marketing VA Improves Campaign Analytics
    11 Min Read
    New Data Analytics Breakthroughs Give eCommerce Startups a Fighting Chance
    New Data Analytics Breakthroughs Give eCommerce Startups a Fighting Chance
    6 Min Read
    How Data Analytics Is Reshaping Patient Financing Decisions
    How Data Analytics Is Reshaping Patient Financing Decisions
    13 Min Read
    business using business intelligence
    How to Use a Competitive Intelligence Dashboard to Turn Market Data Into Smarter Marketing Decisions 
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Can We Automate Data Mining?
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Can We Automate Data Mining?
AnalyticsBig DataBusiness IntelligenceData MiningModeling

Can We Automate Data Mining?

SandroSaitta
SandroSaitta
7 Min Read
SHARE

Automated data miningThat’s a big question! Back in 2006, we started the discussion on Data Mining Research, with the post about the book Java Data Mining. We were fortunate to get opinions from experts and one of the book’s authors.

Automated data miningThat’s a big question! Back in 2006, we started the discussion on Data Mining Research, with the post about the book Java Data Mining. We were fortunate to get opinions from experts and one of the book’s authors. In 2010, we continued the discussion about specific aspects of data mining which could be automated.

Recently, I re-launched the debate on the Swiss Association for Analytics. However, I think it is worth a dedicated blog post. In order to answer this big question, we need to analyze the different phases of data mining and estimate which one can be automated. For this purpose, I have chosen the CRISP-DM methodology (I guess any other data mining process would lead to similar conclusions).

Business understanding

More Read

Agile Marketing
How to Apply Agile Marketing Strategies to Data Driven Enterprises
Realtime Data Pipelines
Ethical Implications Of Industrialized Analytics
Two analysts’ stories
Look Smarter Than You Are

In this critical step, we transform a business problem into a data mining one. We need to understand what should be solved and why. Answers will lead to the following steps. It is clear that this step cannot be automated for a new project. The data miner has to interact with experts to define the data mining problem to solve.

Data understanding

This step consist in understanding the data, the way they have been collected, their particularities, etc. Again, the data miner works in collaboration with field experts to derive knowledge useful for preparing the data (next step). This is a manual task that cannot be automated.

Data preparation

In this step, we transform raw data into meaningful information to mine. An example is outlier detection (and removal). Some companies argue that their tools can automate this step. This is true to a certain extent, but there are limitations. Here is a simple example: what is the threshold for the variable “age” to be an outlier? 100, 110, 150 years old? This is problem dependent. The same issue happens for missing values. Detecting them is often straightforward, but deciding on the action to take needs manual intervention.

Another important aspect of data preparation is feature selection and extraction. While selection can be automated, extraction (through aggregation) needs understanding of the data. Finally, any data mining tool can automate the target variable detection. However, the final choice is left to the data miner, who knows the business problem to solve.

Modeling

This step is where we apply modeling algorithms to processed data. Among others, it involves selecting a data mining algorithm and tuning its parameters. This is certainly the task that can be the most easily automated. Some vendors claim that their tools can automate the model building process. The concept of testing several algorithms with different sets of parameters (tuning) can be automated to a certain extent. However, it supposes that there are enough data, that the choice of the algorithm is not business dependent (which is usually not the case) and that the evaluation criterion is known (see below).

data modeling

Cross Industry Standard Process for Data Mining (CRISP-DM)

Evaluation

In order to validate our data mining results, we need evaluation criteria. Although applying a criterion can be automated and different modeling algorithm can be compared, the choice of the criterion may be business dependent. In the case of forecasting, for example, different evaluation criteria exist such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Scaled Error (MASE). If we compare different forecasting algorithms on the same time series, we can use RMSE. If the goal is to compare different time series, MASE is more appropriate. This is business dependent and thus difficult to automate.

Deployment

In this phase, the goal is to transform our proof of concept or prototype into an industrialized solution. This step involves transforming our “one shot” project into a solution that can work with as few manual interventions as possible. Although standards such as Predictive Model Markup Language (PMML) are appearing, this step stills requires manual intervention. Questions such as where and how to integrate our data mining process within an overall solution/tool need to be explored.

As a conclusion, we have seen that most data mining steps from the CRISP-DM methodology cannot be automated and need manual intervention. Data preparation and modeling, to a certain extent, could be automated. However, as data mining professionals know, most of the effort in a data mining project concerns business and data understanding. Here is an excellent metaphor from Berry and Linoff (re-explained by David S. Coppock):

“The camera can relieve the photographer from having to set the shutter speed, aperture and other settings every time a picture is taken. This makes the process easier for expert photographers and makes better photography accessible to people who are not experts. But this is still automating only a small part of the process of producing a photograph. Choosing the subject, perspective and lighting, getting to the right place at the right time, printing and mounting, and many other aspects are all important in producing a good photograph.”

What about you? Do you think we can automate data mining?

TAGGED:automation
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

big data and customer service outsourcing
How Data Analytics Improves Customer Service Outsourcing
Analytics Exclusive
The End of Unstructured Marketing: Forcing Generative AI into Strict HTML Schemas
The End of Unstructured Marketing: Forcing Generative AI into Strict HTML Schemas
Artificial Intelligence Exclusive
How a Specialized Marketing VA Improves Campaign Analytics
How a Specialized Marketing VA Improves Campaign Analytics
Analytics Exclusive
ai marketing tools
The 9 AI Tools Marketers Use to Create Images and Video in 2026
Artificial Intelligence Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Customer-Focused Marketing: Automation Is the Easy Part

12 Min Read
big data and automation
Data CollectionMarketing

3 Big Data And Automation Resolutions For Entrepreneurs In 2019

6 Min Read
robotic process automation
Big DataExclusive

Is Robotic Process Animation The Next Evolution Of Big Data?

6 Min Read

The ticket puncher on the train

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-26 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?