Abstract

Text categorization is one of the well studied problems in data mining and information retrieval. Even if the research on text categorization has been progressed very much, traditional approaches to text categorization require encoding documents into numerical vectors which leads to the two main problems: huge dimensionality and sparse distribution in each numerical vector. Although many various feature selection methods are developed to address the first problem, the reduced dimension remains still large. If the dimension is reduced excessively by a feature selection method, robustness of text categorization is degraded. The idea of this research as the solution to the problems is to encode medical documents into string vectors and apply it to the novel competitive neural network as a string vector. The quantitative experiment results demonstrate that this method can significantly improve the performance of medical document classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call