Abstract

In this paper, we focus on the utilization of Local Outlier Factor (LOF) algorithm in the task of performing open-set classification on high-dimensional data. Concerning the application on text documents, we research the fastText method for extraction of feature vectors. Then we build a classifier and evaluate its accuracy (precision, recall) on prepared test data, containing both subject categories known during training and completely new categories. Next we attempt to identify incorrect outcomes related to assigning the documents of new categories to one of the trained classes; for this we use the Local Outlier Factor algorithm. We show how decision function threshold in LOF influences the precision and recall of this open-set classification procedure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call