Abstract

BackgroundOne important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Such a system provides important and up to date data for database construction and updating, and even text summarization. In this paper we present a machine learning method to identify these genotype-phenotype relationships. No large human-annotated corpus of genotype-phenotype relationships currently exists. So, a semi-automatic approach has been used to annotate a small labelled training set and a self-training method is proposed to annotate more sentences and enlarge the training set.ResultsThe resulting machine-learned model was evaluated using a separate test set annotated by an expert. The results show that using only the small training set in a supervised learning method achieves good results (precision: 76.47, recall: 77.61, F-measure: 77.03) which are improved by applying a self-training method (precision: 77.70, recall: 77.84, F-measure: 77.77).ConclusionsRelationships between genotypes and phenotypes is biomedical information pivotal to the understanding of a patient’s situation. Our proposed method is the first attempt to make a specialized system to identify genotype-phenotype relationships in biomedical literature. We achieve good results using a small training set. To improve the results other linguistic contexts need to be explored and an appropriately enlarged training set is required.

Highlights

  • One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes

  • Recall, and F-measure with and without this type of phenotype involved in genotype-phenotype relationships labelled in the test set. – Generic expressions referring to a genotype or a phenotype earlier in the text are tagged in this corpus as genotypes and phenotypes

  • For example locus is tagged as a genotype in the following sentence: “Our original association study focused on the role of IBD5 in CD; we explored the potential contribution of this locus to UC susceptibility in 187 German trios.”

Read more

Summary

Introduction

One important type of information contained in biomedical research literature is the newly discovered relationships between phenotypes and genotypes. Because of the large quantity of literature, a reliable automatic system to identify this information for future curation is essential. Such a system provides important and up to date data for database construction and updating, and even text summarization. Experts want to know whether a disease is caused by a genotype or whether a certain genotype determines particular human characteristics. This information is very valuable for researchers, clinicians, and patients. There exist some manually curated resources such as OMIM [1] which are repositories for this information, but they do not provide complete coverage of all genotype-phenotype

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.