Abstract

Text classification is an important task with its scope in many applications like biomedical domain, bioinformatics, text mining, information retrieval, etc. Different methods have been proposed in literature for classification of textual data into binary classes. Multi-label text classification on the other hand becomes a real challenge when the number of classes is high in number or even when the classification aims for higher accuracy. The complexity grows when the content is morphologically rich and yet the solutions to them are eagerly sought after. For decades, there have been extensive discussions on how to select the relevant features in text analytics or text mining. The paper proposes 3 categories for the selection of relevant features in biomedical text, namely statistical features, biomedical features and linguistics features to categorize the abstracts of biomedical literature. The proposed model implements the SVM classification algorithm on the dataset of Hallmark of Cancer with more than 1800 abstracts taken into consideration.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.