Abstract

Data mining, or knowledge discovery in databases (KDD), is an interdisciplinary field that integrates techniques from several research areas including machine learning, statistics, database systems, and pattern recognition, for the analysis of large volumes of possibly complex, highly-distributed and poorly-organized data. The prosperity of the data mining field may attribute to two essential reasons. Firstly, a huge amount of data is collected and stored everyday. On the one hand, along with the continuing development of advanced technologies in many domains, data is generated at enormous speeds. For examples, purchases data at department/grocery stores, bank/credit card transaction data, e-commerce data, Internet traffic data that describes the browsing history of Web users, remote sensor data from agricultural satellites, and gene expression data from microarray technology. On the other hand, the progress made in hardware technology allows today’s computer systems to store very large amounts of data. Secondly, with these large volumes of data at hand, the data owners have an imminent intent to turn them into useful knowledge. From a commercial viewpoint, the ultimate goal of the data owners is to gain more and pay less for their business activities. Under the competition pressure, they want to enhance their services, develop cost-effective strategies, and target the right group of potential customers. From a scientific viewpoint, when traditional techniques are infeasible in dealing with the raw data, data mining may help scientists in many ways, such as classifying and segmenting data. By applying the knowledge extracted from data mining, the business analyst may rate customers by their propensity to respond to an offer, the doctor may estimate the probability of an illness re-occurrence, the website publisher may display customized Web pages to individual Web users according to their browsing habit, and the geneticist may discover novel gene-gene interaction patterns. In this talk, we aim to provide a general picture for important data mining steps, topics, algorithms and challenges.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.