Abstract

Review of feature extraction approaches on biomedical text classification

Highlights

  • Text classification could be one of the solutions to overcome these problems.Text classification is one of the challenging research topics due to the need to organize and categorize the growing number of electronic documents worldwide

  • We calculate each feature weight for both training and testing documents using the Term Frequency– Inverse Document Frequency (TF-IDF) equation as follows; wi,d tfi,d log ( n ); dfi n+1, dfi+1 where, term frequency tfi,d is the frequency of term i occurs in document j and d = 1, ..., m, document frequency dfi is the total number of documents that contain the term,i and n is the total number of documents

  • We conduct several experiments for text classification using a set of features that are extracted using the Genia Tagger tool and set of features that are extracted by medical experts from Pusat Perubatan Universiti Kebangsaan Malaysia (PPUKM)

Read more

Summary

Introduction

Text classification could be one of the solutions to overcome these problems. Raza and Qamar (2016) presented a comparison of using feature selection methods towards large datasets such as Gisette, Isolate, Musk-2, UjlindoorLoc, Egg-Eye-style and Internet advertisement for classification purpose They found that the use of Odds Ratio as a feature selection method produced high accuracy compared to other feature selection methods. While Feng et al (2015) performed a comparison among few feature selection methods such as Information Gain, Chi-squared and Odds ratio for classifying MPH-20 and 20 Newsgroups datasets. In other similar research, Tutkan et al (2016) proposed a new feature selection method named Meaning Based Feature Selection (MBFS) They compared the performance of their proposed feature selection method with other methods such as Information Gain, Chi-squared, Odds ratio.

Data collection
Text preprocessing
Eliminate stop words and general terms
Perform feature extraction
Create a vocabulary list
Perform feature selection
Calculate feature weighting
Perform text classification
Experiments
Results and discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.