Data Mining Methodologies

December 18, 2008
61 Views

I use the CRISP-DM methodology for all Data Mining projects as it is industry and tool neutral, and also the most comprehensive of all the methodologies available. Some Data Mining software vendors have come up with their own methodologies. Check them out.

MS SQL SERVER DATA MINING

1. Defining the Problem: Analyze business requirements, define the scope of the problem, define the metrics by which the model will be evaluated, and define specific ob


I use the CRISP-DM methodology for all Data Mining projects as it is industry and tool neutral, and also the most comprehensive of all the methodologies available. Some Data Mining software vendors have come up with their own methodologies. Check them out.

MS SQL SERVER DATA MINING

1. Defining the Problem: Analyze business requirements, define the scope of the problem, define the metrics by which the model will be evaluated, and define specific objectives for the data mining project.

2. Preparing Data: Remove/handle bad data, find correlations in the data, identify sources of data that are the most accurate, and determining which columns are the most appropriate for use in analysis.

3. Exploring the Data: Calculate the minimum and maximum values, calculate mean and standard deviations, and look at the distribution of the data.

4. Building Models: Specify the input columns, the attribute that you are predicting, and parameters that tell the algorithm how to process the data.

5. Exploring & Validating Models: Use the models to create predictions, which you can then use to make business decisions, create content queries to retrieve statistics, rules, or formulas from the model, embed data mining functionality directly into an application, update the models after review and analysis or update the models dynamically, as more data comes into the organization.

ORACLE DATA MINING

1. Problem Definition: Specify the project objectives and requirements from a business perspective, formulate it as a data mining problem and develop a preliminary implementation plan.

2. Data Gathering and Preparation: Take a closer look at the data, remove some of the data or add additional data, identify data quality problems, and scan for patterns in the data. Typical tasks include table, case, and attribute selection as well as data cleansing and transformation.

3. Model Building and Evaluation: Select and apply various modeling techniques and calibrate the parameters to optimal values. If the algorithm requires data transformations, step back to the previous phase to implement them.

4. Knowledge Deployment: Can involve scoring (the application of models to new data), the extraction of model details (for example the rules of a decision tree), or the integration of data mining models within applications, data warehouse infrastructure, or query and reporting tools.

SEMMA from SAS

1. Sample the data by creating one or more data tables. The sample should be large enough to contain the significant information, yet small enough to process.

2. Explore the data by searching for anticipated relationships, unanticipated trends, and anomalies in order to gain understanding and ideas.

3. Modify the data by creating, selecting, and transforming the variables to focus the model selection process.

4. Model the data by using the analytical tools to search for a combination of the data that reliably predicts a desired outcome.

5. Assess the data by evaluating the usefulness and reliability of the findings from the data mining process.

CRISP-DM (CRoss Industry Standard Process for Data Mining)

1. Business Understanding: Understand the project objectives and requirements from a business perspective, convert this knowledge into a data mining problem definition, and a preliminary plan designed to achieve the objectives.

2. Data Understanding: Collect initial data and proceed with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information.

3. Data Preparation: Tasks include table, record, and attribute selection as well as transformation and cleaning of data for modeling tools.

4. Modeling: Select and apply various modeling techniques, calibrate their parameters to optimal values, step back to the data preparation phase if needed.

5. Evaluation: Evaluate the model, review the steps executed to construct the model, to be certain it properly achieves the business objectives. At the end of this phase, a decision on the use of the data mining results should be reached.

6. Deployment: Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process. In many cases it will be the customer, not the data analyst, who will carry out the deployment steps.

http://datalligence.blogspot.com/

You may be interested

How SAP Hana is Driving Big Data Startups
Big Data
298 shares3,039 views
Big Data
298 shares3,039 views

How SAP Hana is Driving Big Data Startups

Ryan Kh - July 20, 2017

The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
Data Management
57 views
Data Management
57 views

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

Manish Bhickta - July 20, 2017

Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…

10 Simple Rules for Creating a Good Data Management Plan
Data Management
69 shares654 views
Data Management
69 shares654 views

10 Simple Rules for Creating a Good Data Management Plan

GloriaKopp - July 20, 2017

Part of business planning is arranging how data will be used in the development of a project. This is why…