What is Data Mining in Database Management System?
Data mining refers to extracting or mining knowledge from large amounts of data. It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.
The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
The key properties of data mining are
- Automatic discovery of patterns
- Prediction of likely outcomes
- Creation of actionable information
- Focus on large datasets and databases
Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing following capabilities
1. Automated prediction of trends and behaviors
Data mining automates the process of finding predictive information in large databases. A typical example of a predictive problem is targeted marketing.
Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings.
2. Automated discovery of previously unknown patterns
Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together.
Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors
Tasks of Data Mining
Data mining involves six common classes of tasks;
1. Anomaly detection (Outlier/change/deviation detection)
The identification of unusual data records, that might be interesting or data errors that require further investigation.
2. Association rule learning (Dependency modelling)
It searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits.
Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
3. Clustering
It is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.
4. Classification
It is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam".
5. Regression
It attempts to find a function which models the data with the least error.
6. Summarization
It provides more compact representation of the data set, including visualization and report generation.
Architecture of Data Mining
1. Knowledge Base
This is the domain knowledge that is used to guide the search or evaluate the interestingness of resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes or attribute values into different levels of abstraction.
2. Data Mining Engine
This is essential to the data mining system and ideally consists of a set of functional modules for tasks such as characterization, association and correlation analysis, classification, prediction, cluster analysis, outlier analysis, and evolution analysis
3. Pattern Evaluation Module
This component typically employs interestingness measures interacts with the data mining modules so as to focus the search toward interesting patterns. It may use interestingness thresholds to filter out discovered patterns.
For efficient data mining, it is highly recommended to push the evaluation of pattern interestingness as deep as possible into the mining process so as to confine the search to only the interesting patterns.
4. User interface
This module communicates between users and the data mining system, allowing the user to interact with the system by specifying a data mining query or task, providing information to help focus the search, and performing exploratory data mining based on the intermediate data mining results.
In addition, this component allows the user to browse database and data warehouse schemas or data structures, evaluate mined patterns, and visualize the patterns in different forms