Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics risk management
    How Predictive Analytics Is Redefining Risk Management Across Industries
    7 Min Read
    data analytics and gold trading
    Data Analytics and the New Era of Gold Trading
    9 Min Read
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
    data mining to find the right poly bag makers
    Using Data Analytics to Choose the Best Poly Mailer Bags
    12 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Can We Automate Data Mining?
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Can We Automate Data Mining?
AnalyticsBig DataBusiness IntelligenceData MiningModeling

Can We Automate Data Mining?

SandroSaitta
SandroSaitta
7 Min Read
SHARE

Automated data miningThat’s a big question! Back in 2006, we started the discussion on Data Mining Research, with the post about the book Java Data Mining. We were fortunate to get opinions from experts and one of the book’s authors.

Automated data miningThat’s a big question! Back in 2006, we started the discussion on Data Mining Research, with the post about the book Java Data Mining. We were fortunate to get opinions from experts and one of the book’s authors. In 2010, we continued the discussion about specific aspects of data mining which could be automated.

Recently, I re-launched the debate on the Swiss Association for Analytics. However, I think it is worth a dedicated blog post. In order to answer this big question, we need to analyze the different phases of data mining and estimate which one can be automated. For this purpose, I have chosen the CRISP-DM methodology (I guess any other data mining process would lead to similar conclusions).

Business understanding

More Read

predictive analytics in dropshipping
Predictive Analytics Helps New Dropshipping Businesses Thrive
Predicting the next Viral Tweet
5 Ways Layered Navigation Improves Business Intelligence Strategies
Future of Open Source Survey – Results
Dear Oracle: Cloud Multitenancy DOES Matter

In this critical step, we transform a business problem into a data mining one. We need to understand what should be solved and why. Answers will lead to the following steps. It is clear that this step cannot be automated for a new project. The data miner has to interact with experts to define the data mining problem to solve.

Data understanding

This step consist in understanding the data, the way they have been collected, their particularities, etc. Again, the data miner works in collaboration with field experts to derive knowledge useful for preparing the data (next step). This is a manual task that cannot be automated.

Data preparation

In this step, we transform raw data into meaningful information to mine. An example is outlier detection (and removal). Some companies argue that their tools can automate this step. This is true to a certain extent, but there are limitations. Here is a simple example: what is the threshold for the variable “age” to be an outlier? 100, 110, 150 years old? This is problem dependent. The same issue happens for missing values. Detecting them is often straightforward, but deciding on the action to take needs manual intervention.

Another important aspect of data preparation is feature selection and extraction. While selection can be automated, extraction (through aggregation) needs understanding of the data. Finally, any data mining tool can automate the target variable detection. However, the final choice is left to the data miner, who knows the business problem to solve.

Modeling

This step is where we apply modeling algorithms to processed data. Among others, it involves selecting a data mining algorithm and tuning its parameters. This is certainly the task that can be the most easily automated. Some vendors claim that their tools can automate the model building process. The concept of testing several algorithms with different sets of parameters (tuning) can be automated to a certain extent. However, it supposes that there are enough data, that the choice of the algorithm is not business dependent (which is usually not the case) and that the evaluation criterion is known (see below).

data modeling

Cross Industry Standard Process for Data Mining (CRISP-DM)

Evaluation

In order to validate our data mining results, we need evaluation criteria. Although applying a criterion can be automated and different modeling algorithm can be compared, the choice of the criterion may be business dependent. In the case of forecasting, for example, different evaluation criteria exist such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Scaled Error (MASE). If we compare different forecasting algorithms on the same time series, we can use RMSE. If the goal is to compare different time series, MASE is more appropriate. This is business dependent and thus difficult to automate.

Deployment

In this phase, the goal is to transform our proof of concept or prototype into an industrialized solution. This step involves transforming our “one shot” project into a solution that can work with as few manual interventions as possible. Although standards such as Predictive Model Markup Language (PMML) are appearing, this step stills requires manual intervention. Questions such as where and how to integrate our data mining process within an overall solution/tool need to be explored.

As a conclusion, we have seen that most data mining steps from the CRISP-DM methodology cannot be automated and need manual intervention. Data preparation and modeling, to a certain extent, could be automated. However, as data mining professionals know, most of the effort in a data mining project concerns business and data understanding. Here is an excellent metaphor from Berry and Linoff (re-explained by David S. Coppock):

“The camera can relieve the photographer from having to set the shutter speed, aperture and other settings every time a picture is taken. This makes the process easier for expert photographers and makes better photography accessible to people who are not experts. But this is still automating only a small part of the process of producing a photograph. Choosing the subject, perspective and lighting, getting to the right place at the right time, printing and mounting, and many other aspects are all important in producing a good photograph.”

What about you? Do you think we can automate data mining?

TAGGED:automation
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

street address database
Why Data-Driven Companies Rely on Accurate Street Address Databases
Big Data Exclusive
predictive analytics risk management
How Predictive Analytics Is Redefining Risk Management Across Industries
Analytics Exclusive Predictive Analytics
data analytics and gold trading
Data Analytics and the New Era of Gold Trading
Analytics Big Data Exclusive
student learning AI
Advanced Degrees Still Matter in an AI-Driven Job Market
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

artificial intelligence can benefit the finance industry
Artificial Intelligence

How Artificial Intelligence Can Benefit The Finance Industry

6 Min Read
interpersonal skills in the age of AI
Artificial IntelligenceExclusive

Peak Irony: Interpersonal Skills In The Age of AI Are More Vital Than Ever

6 Min Read

The ticket puncher on the train

4 Min Read
intelligence
Artificial IntelligenceBusiness Intelligence

Why Artificial Intelligence is The Future of Technology

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?