Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Can We Automate Data Mining?
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Can We Automate Data Mining?
AnalyticsBig DataBusiness IntelligenceData MiningModeling

Can We Automate Data Mining?

SandroSaitta
SandroSaitta
7 Min Read
SHARE

Automated data miningThat’s a big question! Back in 2006, we started the discussion on Data Mining Research, with the post about the book Java Data Mining. We were fortunate to get opinions from experts and one of the book’s authors.

Automated data miningThat’s a big question! Back in 2006, we started the discussion on Data Mining Research, with the post about the book Java Data Mining. We were fortunate to get opinions from experts and one of the book’s authors. In 2010, we continued the discussion about specific aspects of data mining which could be automated.

Recently, I re-launched the debate on the Swiss Association for Analytics. However, I think it is worth a dedicated blog post. In order to answer this big question, we need to analyze the different phases of data mining and estimate which one can be automated. For this purpose, I have chosen the CRISP-DM methodology (I guess any other data mining process would lead to similar conclusions).

Business understanding

More Read

deep data feed
Deep Data Makes Digital Catalogues Viable Organizational Tools
Why You Should Already Have a Data Governance Strategy
Splunk: Big Data Machine for Operational Intelligence
How M2M Network Connectivity Is Driving the Growth of Industries [INFOGRAPHIC]
Black Swans Causing a Rethink on Global Supply Chains?

In this critical step, we transform a business problem into a data mining one. We need to understand what should be solved and why. Answers will lead to the following steps. It is clear that this step cannot be automated for a new project. The data miner has to interact with experts to define the data mining problem to solve.

Data understanding

This step consist in understanding the data, the way they have been collected, their particularities, etc. Again, the data miner works in collaboration with field experts to derive knowledge useful for preparing the data (next step). This is a manual task that cannot be automated.

Data preparation

In this step, we transform raw data into meaningful information to mine. An example is outlier detection (and removal). Some companies argue that their tools can automate this step. This is true to a certain extent, but there are limitations. Here is a simple example: what is the threshold for the variable “age” to be an outlier? 100, 110, 150 years old? This is problem dependent. The same issue happens for missing values. Detecting them is often straightforward, but deciding on the action to take needs manual intervention.

Another important aspect of data preparation is feature selection and extraction. While selection can be automated, extraction (through aggregation) needs understanding of the data. Finally, any data mining tool can automate the target variable detection. However, the final choice is left to the data miner, who knows the business problem to solve.

Modeling

This step is where we apply modeling algorithms to processed data. Among others, it involves selecting a data mining algorithm and tuning its parameters. This is certainly the task that can be the most easily automated. Some vendors claim that their tools can automate the model building process. The concept of testing several algorithms with different sets of parameters (tuning) can be automated to a certain extent. However, it supposes that there are enough data, that the choice of the algorithm is not business dependent (which is usually not the case) and that the evaluation criterion is known (see below).

data modeling

Cross Industry Standard Process for Data Mining (CRISP-DM)

Evaluation

In order to validate our data mining results, we need evaluation criteria. Although applying a criterion can be automated and different modeling algorithm can be compared, the choice of the criterion may be business dependent. In the case of forecasting, for example, different evaluation criteria exist such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Scaled Error (MASE). If we compare different forecasting algorithms on the same time series, we can use RMSE. If the goal is to compare different time series, MASE is more appropriate. This is business dependent and thus difficult to automate.

Deployment

In this phase, the goal is to transform our proof of concept or prototype into an industrialized solution. This step involves transforming our “one shot” project into a solution that can work with as few manual interventions as possible. Although standards such as Predictive Model Markup Language (PMML) are appearing, this step stills requires manual intervention. Questions such as where and how to integrate our data mining process within an overall solution/tool need to be explored.

As a conclusion, we have seen that most data mining steps from the CRISP-DM methodology cannot be automated and need manual intervention. Data preparation and modeling, to a certain extent, could be automated. However, as data mining professionals know, most of the effort in a data mining project concerns business and data understanding. Here is an excellent metaphor from Berry and Linoff (re-explained by David S. Coppock):

“The camera can relieve the photographer from having to set the shutter speed, aperture and other settings every time a picture is taken. This makes the process easier for expert photographers and makes better photography accessible to people who are not experts. But this is still automating only a small part of the process of producing a photograph. Choosing the subject, perspective and lighting, getting to the right place at the right time, printing and mounting, and many other aspects are all important in producing a good photograph.”

What about you? Do you think we can automate data mining?

TAGGED:automation
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

Hidden AI, a risk?
Hidden AI, Real Risk: A Governance Roadmap For Mid-Market Organizations
Artificial Intelligence Exclusive Infographic
unusual trading activity
Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
Analytics Exclusive Infographic
Ai agents
AI Agent Trends Shaping Data-Driven Businesses
Artificial Intelligence Exclusive Infographic
Why Businesses Are Using Data to Rethink Office Operations
Why Businesses Are Using Data to Rethink Office Operations
Big Data Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

The ticket puncher on the train

4 Min Read
Image
AnalyticsBig DataPredictive Analytics

Predictive Analytics Presents: A Typical Day in 2020

7 Min Read
intelligence
Artificial IntelligenceBusiness Intelligence

Why Artificial Intelligence is The Future of Technology

5 Min Read

The “Right” Degree of Automation

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?