Abstract

Labeling a large amount of data is exceptionally costly and practically infeasible, and thus available data may have missing labels. In this article, we investigate outlier detection for partially labeled categorical data based on conditional information entropy. Firstly, the equivalence class in a partially labeled categorical decision information system (p-CDIS) is introduced, so that the missing labels can be predicted by use of conditional probability. Then, conditional information entropy in a p-CDIS is calculated, which provides a more comprehensive measure of uncertainty. Additionally, the relative information entropy and relative cardinality in a p-CDIS are proposed. Next, the degree of outlierness and the weight function are presented to find outlier factors. Finally, an outlier detection method in a p-CDIS based on conditional information entropy is proposed, and a corresponding conditional information entropy algorithm (CEOF) is designed. To evaluate the stability of the CEOF algorithm, experiments are performed on ten UCI Machine Learning Repository datasets. Compared with five other algorithms, the proposed method is shown to have good effectiveness and adaptability for categorical data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.