Abstract
BackgroundCuration of gene-disease associations published in literature should be based on careful and frequent survey of the references that are highly related to specific gene-disease associations. Retrieval of the references is thus essential for timely and complete curation.ResultsWe present a technique CRFref (Conclusive, Rich, and Focused References) that, given a gene-disease pair < g, d>, ranks high those biomedical references that are likely to provide conclusive, rich, and focused results about g and d. Such references are expected to be highly related to the association between g and d. CRFref ranks candidate references based on their scores. To estimate the score of a reference r, CRFref estimates and integrates three measures: degree of conclusiveness, degree of richness, and degree of focus of r with respect to < g, d>. To evaluate CRFref, experiments are conducted on over one hundred thousand references for over one thousand gene-disease pairs. Experimental results show that CRFref performs significantly better than several typical types of baselines in ranking high those references that expert curators select to develop the summaries for specific gene-disease associations.ConclusionCRFref is a good technique to rank high those references that are highly related to specific gene-disease associations. It can be incorporated into existing search engines to prioritize biomedical references for curators and researchers, as well as those text mining systems that aim at the study of gene-disease associations.
Highlights
Curation of gene-disease associations published in literature should be based on careful and frequent survey of the references that are highly related to specific gene-disease associations
It is quite difficult to curate gene-disease associations in a complete and timely manner, mainly due to two reasons: (1) a huge and ever-growing amount of biomedical references need to be searched for the large number of possible gene-disease pairs, and (2) curation of even a single gene-disease association needs to be based on a careful survey of biomedical literature
Timely and complete curation is both costly and challenging, since the curation needs to be based on careful and frequent survey of the references that are highly related to specific gene-disease associations
Summary
Curation of gene-disease associations published in literature should be based on careful and frequent survey of the references that are highly related to specific gene-disease associations. To facilitate knowledge sharing and further research, the gene-disease associations reported in the literature need to be curated, and several online databases of the gene-disease associations have been built and maintained. Typical examples of such databases are Genetic Home Reference (GHR)b and Online Mendelian Inheritance in Human (OMIM)c. It is quite difficult to curate gene-disease associations in a complete and timely manner, mainly due to two reasons: (1) a huge and ever-growing amount of biomedical references need to be searched for the large number of possible gene-disease pairs, and (2) curation of even a single gene-disease association needs to be based on a careful survey of biomedical literature (e.g., for each association the curators of GHR need to carefully find and check multiple articles to exclude unproven or controversial informationd). New research findings often take time to be curated from biomedical literature [1,2,3,4]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.