Abstract

INTRODUCTION The term data mining refers to the extraction of valuable knowledge from large amounts of data. Data mining is the process of discovering knowledge from data. With the massive quantity of data stored in repositories, it is progressively more significant to develop powerful analysis and decision making tool for the extraction of interesting knowledge. The task of classification is concerned with predicting the value of one field from the values of other field. The target field is called the class. The other fields are called attributes. Propositional machine learning algorithms assume the input data is represented in a simple attribute-value format. Most existing data mining algorithms (including algorithms for classification, clustering, association analysis, outlier detection, etc.) work on single tables. For example, a typical classification algorithm (e.g., C4.5 or SVM) works on a table containing many tuples, each of which has a class label, and a value on each attribute in the table. In recent years, there has been growing interest in multi-relational classification research and application, which address the difficulties in dealing with large relation search space, complex relationships between relations, and a daunting number of attributes involved. Most structured data is stored in relational databases, which is stored in multiple relations by their characters. Conventionally, many classification approaches can only be applied to a single relation. When performing these approaches on multi-relational data, it often requires transferring data into a single table by flattening and feature construction, which is known as Propositionalization. However, many of these methods are heuristic, so flatten may cause some problems such as time consuming and statistical skew on data. Multi-relational data mining (MRDM) has been successfully applied in a variety of areas, such as marketing, sales, finance, fraud detection, and natural sciences. Multi-Relational data mining looks for patterns that involve multiple relations in a relational database, its main difference with traditional data mining approaches is that it does not need to transform the data into a single table, it learns from the data in its original form preserving its structure and incorporating such structure into the learning process. RELATIONAL DATABASES A relational database is a collection of tables called relations, each of which is assign a unique name. Each relation consists of a set of attributes and stores a large set of tuples. Every tuple in a relational table represents an object which is used to identifying by a unique key to describe by a set of attribute values. Often one uses a semantic model to represent relational databases, allowing one to describe and design the database without having to pay attention to the physical database. Such a model is often referred to as a database scheme. One of the most common models is the Entity-Relationship (ER) model (Figure 1). A relational database typically consists of several tables (relations) and not just one table. A schema for a relational databases describe a set of entities DB = {E1, E2, En}, and set of relationships between entities. Each row in a relation is a tuple. Each relation has at least one primary key attributes. The other attributes are either descriptive attributes or foreign key attributes. Foreign key attributes link to primary key attribute of other relations. A relational database contains multiple interconnected relations, each of which represents a certain kind of objects or a type of relationships. A relational database consists of a set of named tables, often referred to as relations that individually behave as the single table that is the subject of Propositional Data Mining. Data structures more complex than a single record are implemented by relating pairs of tables through so-called foreign key relations. Such a relation specifies how certain columns in one table can be used to look up information in corresponding columns in the other table, thus relating sets of records in the two tables. …

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call