Abstract

Unlabeled data representation constitutes a major challenge in data mining. Different unsupervised learning methods such as clustering and dimensionality reduction form the basis of data representations. The impact of attribute combinations and their interactions on data is less addressed by such models. A representation model supported with machine learning concepts can reveal more information about the nature of underlying data. We herein present a novel unsupervised minimum attribute instance selection (UMAIS) labeling algorithm that selects a categorical attribute as a class label, and a novel attribute-based powerset generation (APSG) algorithm for describing the formation of relevant attribute sets using correlation and powerset. Using these algorithms, we present a diagrammatic representation known as Representational Automata that depict the importance of interactions among correlated and non-correlated attributes present in an unlabeled dataset. We performed experiments using two large-scale datasets from the energy and financial domains and compared our approach with other standard classifiers. Our approach obtains a significantly better classification accuracy of 92.187% and 87.32% for the energy and financial datasets, respectively, compared to 74% and 82% of the linear classifier, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call