Abstract
To identify the data examples of rare categories that form small compact clusters in large data sets, existing approaches mostly require enough labeled data examples as a training set to learn a classifier, assuming that the rare-category clusters are spherical or nearly spherical. Nonetheless, a large enough training set is usually difficult to obtain in practice, and rare categories in many real-world applications often form small compact clusters with arbitrary shapes. In this paper, we investigate how to identify all data examples of a rare category with an arbitrary shape based on only one seed (i.e., a labeled rare-category data example). Instead of finding a compact and spherical local region around the seed, we locally explore the data set from the seed by continuously searching and visiting the $k$ k -nearest neighbors of each newly visited data example. The local exploration connects the data examples in the objective rare category by the relationship of $k$ k -nearest neighbors, and meanwhile, suspected external data examples are filtered out if they are not close enough to any visited data example. Experimental results on both synthetic and real-world data sets are conducted, and the results verify the effectiveness and efficiency of our approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Knowledge and Data Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.