Big Data & Data Mining

Introduction Of Data Mining

Data mining involves finding interesting patterns from datasets. Big data involves large scale storage and processing(often at a datacenter scale) of large data sets. So, data mining done of big data(e.g, finding buying patterns from large purchase logs) is very interesting and is getting lot of attention currently. All big data task are not data mining ones(e.g, large scale indexing). All data mining tasks are not on big data.

History of Data Mining

The manual extraction of patterns from data has occurred for centuries. Early methods of identifying patterns in data include Bayes’ theorem (1700s) and regression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology has dramatically increased data collection, storage, and manipulation ability. As data sets have grown in size and complexity, direct “hands-on” data analysis has increasingly been augmented with indirect, automated data processing, aided by other discoveries in computer science, such as neural networks, cluster analysis, genetic algorithms (1950s), decision trees and decision rules (1960s), and support vector machines (1990s).

Data mining is the process of applying these methods with the intention of uncovering hidden patterns[13] in large data sets. It bridges the gap from applied statistics and artificial intelligence (which usually provide the mathematical background) to database management by exploiting the way data is stored and indexed in databases to execute the actual learning and discovery algorithms more efficiently, allowing such methods to be applied to ever larger data sets.