Exploiting and integrating rich features for biological literature classification

Hongning Wang,Minlie Huang,Xiaoyan Zhu,Shilin Ding

doi:10.1186/1471-2105-9-s3-s4

Abstract

BackgroundEfficient features play an important role in automated text classification, which definitely facilitates the access of large-scale data. In the bioscience field, biological structures and terminologies are described by a large number of features; domain dependent features would significantly improve the classification performance. How to effectively select and integrate different types of features to improve the biological literature classification performance is the major issue studied in this paper.ResultsTo efficiently classify the biological literatures, we propose a novel feature value schema TF*ML, features covering from lower level domain independent “string feature” to higher level domain dependent “semantic template feature”, and proper integrations among the features. Compared to our previous approaches, the performance is improved in terms of AUC and F-Score by 11.5% and 8.8% respectively, and outperforms the best performance achieved in BioCreAtIvE 2006.ConclusionsDifferent types of features possess different discriminative capabilities in literature classification; proper integration of domain independent and dependent features would significantly improve the performance and overcome the over-fitting on data distribution.

Highlights

Efficient features play an important role in automated text classification, which definitely facilitates the access of large-scale data
We investigate the issue of biological literature classification from the perspective of feature selection and integration, which is evaluated by BioCreAtIvE [10], an international evaluation in biological text mining
The experiment results clearly demonstrate that the lower level features are endowed with better generalization capability, but hampered by lower accuracy; higher level features contain rich domain dependent information, with better specificity but poor universality

Summary

Introduction

Efficient features play an important role in automated text classification, which definitely facilitates the access of large-scale data. Biological structures and terminologies are described by a large number of features; domain dependent features would significantly improve the classification performance. Regev et al used expert-defined rules to extract features from the semi-structure text and figure legends They utilized external lexical resources and semantic constraints to achieve a better coverage and accuracy [3]. Ghanem et al utilized expert-edited regular expressions to capture frequently occurring keyword combinations (or motifs) within short segments of the text in a document [5] All these approaches require the involvement of domain experts in identifying the specific textual objects and the informative templates, so that they cannot be automatically extended to an efficient and scalefree model on other biological datasets [6]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 1, 2008
Citations: 27	License type: cc-by

R Discovery Prime

R Discovery Prime

Exploiting and integrating rich features for biological literature classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Dirichlet-Derived Multiple Topic Scene Classification Model for High Spatial Resolution Remote Sensing Imagery
Bei Zhao ... Liangpei Zhang
IEEE Transactions on Geoscience and Remote Sensing | VOL. 54
Bei Zhao, et. al.Bei Zhao ... Liangpei Zhang
01 Apr 2016
IEEE Transactions on Geoscience and Remote Sensing | VOL. 54

Demersal fish assemblages on seamounts and other rugged features in the northeastern Caribbean
Andrea M Quattrini ... Jason D Chaytor
Deep Sea Research Part I: Oceanographic Research Papers | VOL. 123
Andrea M Quattrini, et. al.Andrea M Quattrini ... Jason D Chaytor
18 Mar 2017
Deep Sea Research Part I: Oceanographic Research Papers | VOL. 123

A unified non-rigid feature registration method for brain mapping
Haili Chui ... Anand Rangarajan
Medical Image Analysis | VOL. 7
Haili Chui, et. al.Haili Chui ... Anand Rangarajan
11 Apr 2003
Medical Image Analysis | VOL. 7

A Privacy-Preserving Cross-Domain Healthcare Wearables Recommendation Algorithm Based on Domain-Dependent and Domain-Independent Feature Fusion.
Xu Yu ... Hongwu Lv
IEEE Journal of Biomedical and Health Informatics | VOL. 26
Xu Yu, et. al.Xu Yu ... Hongwu Lv
01 Apr 2021
IEEE Journal of Biomedical and Health Informatics | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting and integrating rich features for biological literature classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics