Abstract
The effect of automatic text classification depends on training data to a great extent. However, the actual data often contains noise. It is often difficult, expensive or time consuming to improve the quality of data without noise at all. Aiming at this problem, a novel text classification algorithm is proposed based on sparse distributed representation (SDR) which is extremely tolerant to noise. The algorithm first created class-SDR for each class label by merging category feature vectors with the subsample technique. Then, the algorithm assigns a class label for a document by comparing the overlap value of SDR with class-SDRs. The experimental results show that the algorithm has better performance in classification with noise training data compared with six frequently used text classification algorithms.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have