Data mining is a process for extracting patterns out of data. It is seen as an important tool nowadays in modern business for transforming data into business intelligence thereby providing the informational added advantage. This process is used in a large range of profiling practices like surveillance, marketing, scientific discoveries and fraud detection.
Process of data miningPre-process: Before the use of data mining algorithms, you need to assemble a target data. Remember data mining can only extract patterns already there in the data. The dataset has to be big enough to encompass these patterns and at the same time be concise enough to be data mined in a reasonable time. A common data source is the data warehouse. Pre-processing is necessary for analyzing multivariate datasets before the actual data mining.
Data Mining: Data mining normally consists of four classes of tasks. They are:
- Clustering: This is the task which involves discovering structures and groups within the data which in one way or the other are similar. This is also done without using the known structures in data.
- Classification: This task involves generalizing the known structure for applying it to the newer data. As an example, email software will try to classify an email as a spam or a genuine one.
- Regression: This task involves attempting to find a function that models the data with the least quantities of error.
- Association rule learning: In this task you need to search for the relationship between different variables. As an example, you might be able to gather data about the buying habits of customers in a supermarket. By making use of association rule learning the supermarket owners can find out which products are bought frequently and use this info for marketing purpose. This is also known as market basket analysis.
Validation of results: The final step of the data mining process is knowledge recovery from data. This is performed to verify whether the patterns produced by data mining also occur in a wider data set. All the patterns produced in data mining may not be valid. For overcoming this problem, a test set of data is used independent of data mining. The output generated by doing this is compared to the desired one. Another statistical method used for the evaluation of data mining process is called ROC curves. If the patterns generated are not up to the desired standards, you need to re-evaluate the data mining process. And if the patterns are meeting the standards, the last step is to interpret the patterns and use them as knowledge.
Important applications of data mining
Data mining when used in customer relationship management (CRM) in businesses can be quite productive. Instead of randomly contacting people or customers from the call center, the business can concentrate on efforts on good prospects that are more likely to respond to an offer. Many sophisticated methods are used for this. It is also used in the HR field of businesses to identify the good characteristics of quality employees. This info can be used for better efforts from the HR people. Data mining can also be used in retail sales and this variety is called market basket analysis. It may also be used in integrated circuit production line.
Data mining is also widely used in science and engineering. It is used in the fields of genetics, education, bio-informatics, electrical engineering and medicine.
Another application is called spatial data mining, in which the technique of data mining can be applied to spatial data. The difference between spatial and normal data mining is the end objective which is finding patterns in geography.