Abstract

Outlier detection is a process to find out the objects that have the abnormal behavior. It can be applied in many aspects, such as public security, finance and medical care. An information system (IS) as a database that shows relationships between objects and attributes. A real-valued information system (RVIS) is an IS whose information values are real numbers. A RVIS with missing values is an incomplete real-valued information system (IRVIS). The notion of inner boundary comes from the boundary region in rough set theory (RST). This paper conducts experiments directly in an IRVIS and investigates outlier detection in an IRVIS based on inner boundary. Firstly, the distance between two information values on each attribute of an IRVIS is introduced, and the parameter λ to control the distance is given. Then, the tolerance relations on the object set are defined according to the distance, by the way, the tolerance classes, the λ-lower and λ-upper approximations in an IRVIS are put forward. Next, the inner boundary under each conditional attribute in an IRVIS is presented. The more inner boundaries an object belongs to, the more likely it is to be an outlier. Finally, an outlier detection method in an IRVIS based on inner boundary is proposed, and the corresponding algorithm (DE) is designed, where DE means degree of exceptionality. Through the experiments base on UCI Machine Learning Repository data sets, the DE algorithm is compared with other five algorithms. Experimental results show that DE algorithm has the better outlier detection effect in an IRVIS. It is worth mentioning that for comprehensive comparison, ROC curve and AUC value are used to illustrate the advantages of the DE algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call