Abstract

Chemical Compound Extraction refers to the task of recognizing chemical instances such as oxygen nitrogen and others. The majority of studies that addressed the task of chemical compound extraction used machine-learning techniques. The key challenge behind using machine-learning techniques lies in employing a robust set of features. In fact, the literature shows that there are numerous types of features used in the task of chemical compound extraction. Such dimensionality of features can be determined via data representation. Some researchers have used N-gram representation for biomedical-named entity recognition, where the most significant terms are represented as features. Meanwhile, others have used detailed-attribute representation in which the features are generalized. As a result, identifying the best combination of features to yield high-accuracy classification becomes challenging. This paper aims to apply the Wrapper Subset Selection approach using two data representations—N-gram and detailed-attributes. Since each data representation would suit a specific classification algorithm, two classifiers were utilized—Naí¯ve Bayes (for detailed-attributes) and Support Vector Machine (for N-gram). The results show that the application of feature selection using detailed-attributes outperformed that of N-gram representation by achieving a 0.722 f-measure. Despite the higher classification accuracy, the selected features using detailed-attribute representation have more meaning and can be applied for further datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.