What is Data Mining in Database Management System?

Data mining refers to extracting or mining knowledge from large amounts of data. It is the computational  process of discovering patterns in large data sets involving methods at the intersection of artificial  intelligence, machine learning, statistics, and database systems. 

The overall goal of the data mining process is to extract information from a data set and transform it into  an understandable structure for further use.  

The key properties of data mining are  

  1. Automatic discovery of patterns  
  2. Prediction of likely outcomes  
  3. Creation of actionable information  
  4. Focus on large datasets and databases  

Given databases of sufficient size and quality, data mining technology can generate new business  opportunities by providing following capabilities  

1. Automated prediction of trends and behaviors  

Data mining automates the process of finding predictive information in large databases. A typical  example of a predictive problem is targeted marketing.

Data mining uses data on past  promotional mailings to identify the targets most likely to maximize return on investment in  future mailings. 

2. Automated discovery of previously unknown patterns  

Data mining tools sweep through databases and identify previously hidden patterns in one step.  An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated  products that are often purchased together.

Other pattern discovery problems include detecting  fraudulent credit card transactions and identifying anomalous data that could represent data  entry keying errors 

Tasks of Data Mining 


Data mining involves six common classes of tasks; 

1. Anomaly detection (Outlier/change/deviation detection)  

The identification of unusual data records, that might be interesting or data errors that require  further investigation.  

2. Association rule learning (Dependency modelling)  

It searches for relationships between variables. For example a supermarket might gather data on  customer purchasing habits.

Using association rule learning, the supermarket can determine  which products are frequently bought together and use this information for marketing purposes.  This is sometimes referred to as market basket analysis.  

3. Clustering 

It is the task of discovering groups and structures in the data that are in some way or another  "similar", without using known structures in the data.  

4. Classification  

It is the task of generalizing known structure to apply to new data. For example, an e-mail  program might attempt to classify an e-mail as "legitimate" or as "spam".  

5. Regression  

It attempts to find a function which models the data with the least error.  

6. Summarization  

It provides more compact representation of the data set, including visualization and report  generation.

Architecture of Data Mining 


1. Knowledge Base

This is the domain knowledge that is used to guide the search or evaluate the interestingness of  resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes  or attribute values into different levels of abstraction. 

2. Data Mining Engine

This is essential to the data mining system and ideally consists of a set of functional modules for  tasks such as characterization, association and correlation analysis, classification, prediction,  cluster analysis, outlier analysis, and evolution analysis  

3. Pattern Evaluation Module

This component typically employs interestingness measures interacts with the data mining  modules so as to focus the search toward interesting patterns. It may use interestingness  thresholds to filter out discovered patterns.

For efficient data mining, it is highly recommended to push the evaluation of pattern interestingness as deep as possible into the mining process so  as to confine the search to only the interesting patterns.  

4. User interface

This module communicates between users and the data mining system, allowing the user to  interact with the system by specifying a data mining query or task, providing information to help  focus the search, and performing exploratory data mining based on the intermediate data mining  results. 

In addition, this component allows the user to browse database and data warehouse schemas or  data structures, evaluate mined patterns, and visualize the patterns in different forms