Online biomedical publication classification using Multi-Instance Multi-Label algorithms with feature reduction

Dong Ren,Matthew D Turner,Peter T Fox,Yanqing Zhang,Raj Sunderraman,Angela R Laird,Jessica A Turner,Long Ma

doi:10.1109/icci-cc.2015.7259391

Abstract

Text annotation, the assignment of metadata to documents, requires significant time and effort when performed by humans. A variety of text mining methods have been used to automate this process, many of them based on either keyword extraction or word counts. However, when using keywords as text classification features, it is common to find that (1) the number of training instances is much less than the number of features extracted. This complexity affects text classification performance. Another challenge is (2) the assignment of multiple, non-exclusive labels to the documents (multi-label classification). This problem makes text classification more complicated when compared with single label classification. We use, as an example, a set of expertly labeled documents from the human functional neuroimaging literature, and we apply a Multi-instance Multi-label (MIML) classification algorithm to the problem. To address (1), we apply a feature reduction approach to reduce the feature dimension. For (2) we use an MIML algorithm called MIMLfast to implement the multi-label classification.

Full Text