Data mining systems face a lot of data mining challenges and issues in today’s world some of them are:
- Mining methodology and user interaction issues
- Performance issues
- Issues relating to the diversity of database types
Data Mining Issues
1. Mining methodology and user interaction issues:
i. Mining different kinds of knowledge in databases:
Different user - different knowledge - different way. That means different client want a different kind of information so it becomes difficult to cover vast range of data that can meet the client requirement.
ii. Interactive mining of knowledge at multiple levels of abstraction:
Interactive mining allows users to focus the search for patterns from different angles. The data mining process should be interactive because it is difficult to know what can be discovered within a database.
iii. Incorporation of background knowledge:
Background knowledge is used to guide discovery process and to express the discovered patterns.
iv. Query languages and ad hoc mining:
Relational query languages (such as SQL) allow users to pose ad-hoc queries for data retrieval. The language of data mining query language should be in perfectly matched with the query language of data warehouse.
v. Handling noisy or incomplete data:
In a large database, many of the attribute values will be incorrect. This may be due to human error or because of any instruments fail. Data cleaning methods and data analysis methods are used to handle noise data.
2. Performance issues
i. Efficiency and scalability of data mining algorithms:
To effectively extract information from a huge amount of data in databases, data mining algorithms must be efficient and scalable.
ii. Parallel, distributed, and incremental mining algorithms:
The huge size of many databases, the wide distribution of data, and complexity of some data mining methods are factors motivating the development of parallel and distributed data mining algorithms. Such algorithms divide the data into partitions, which are processed in parallel.
3. Issues relating to the diversity of database types:
i. Handling of relational and complex types of data:
There are many kinds of data stored in databases and data warehouses. It is not possible for one system to mine all these kind of data. So different data mining system should be construed for different kinds data.
ii. Mining information from heterogeneous databases and global information systems:
Since data is fetched from different data sources on Local Area Network (LAN) and Wide Area Network (WAN).The discovery of knowledge from different sources of structured is a great challenge to data mining.
Major Challenges In Data Mining
Transforming data into organized information is not an easy process. There are many challenges in data mining.
Below are some of these Challenges listed and briefly explained:
1. Security and Social Challenges
Dynamic techniques are done through data assortment sharing, so it requires impressive security. Private information about people and touchy information is gathered for the client’s profiles, client standard of conduct understanding—illicit admittance to information and the secret idea of information turning into a significant issue.
2. Noisy and Incomplete Data
Data Mining is the way toward obtaining information from huge volumes of data. This present reality information is noisy, incomplete, and heterogeneous. Data in huge amounts regularly will be unreliable or inaccurate. These issues could be because of human mistakes blunders or errors in the instruments that measure the data.
3. Distributed Data
True data is normally put away on various stages in distributed processing conditions. It very well may be on the internet, individual systems, or even on the databases. It is essentially hard to carry all the data to a unified data archive principally because of technical and organizational reasons.
4. Complex Data
True data is truly heterogeneous, and it very well may be media data, including natural language text, time series, spatial data, temporal data, complex data, audio or video, images, etc. It is truly hard to deal with these various types of data and concentrate on the necessary information. More often than not, new apparatuses and systems would need to be created to separate important information.
5. Performance
The presentation of the data mining framework basically relies upon the productivity of techniques and algorithms utilized. On the off chance that the techniques and algorithms planned are not sufficient; at that point, it will influence the presentation of the data mining measure unfavorably.
6. Scalability and Efficiency of the Algorithms
The Data Mining algorithm should be scalable and efficient to extricate information from tremendous measures of data in the data set.
7. Improvement of Mining Algorithms
Factors, for example, the difficulty of data mining approaches, the enormous size of the database, and the entire data flow inspire the distribution and creation of parallel data mining algorithms.
8. Incorporation of Background Knowledge
In the event that background knowledge can be consolidated, more accurate and reliable data mining arrangements can be found. Predictive tasks can make more accurate predictions, while descriptive tasks can come up with more useful findings. Be that as it may, gathering and including foundation knowledge is an unpredictable cycle.