Assigning species information to corresponding genes by a sequence labeling framework.

Ling Luo,Rezarta Islamaj,Zhiyong Lu,Qingyu Chen,Po-Ting Lai,Chih-Hsuan Wei

doi:10.1093/database/baac090

Abstract

The automatic assignment of species information to the corresponding genes in a research article is a critically important step in the gene normalization task, whereby a gene mention is normalized and linked to a database record or an identifier by a text-mining algorithm. Existing methods typically rely on heuristic rules based on gene and species co-occurrence in the article, but their accuracy is suboptimal. We therefore developed a high-performance method, using a novel deep learning-based framework, to identify whether there is a relation between a gene and a species. Instead of the traditional binary classification framework in which all possible pairs of genes and species in the same article are evaluated, we treat the problem as a sequence labeling task such that only a fraction of the pairs needs to be considered. Our benchmarking results show that our approach obtains significantly higher performance compared to that of the rule-based baseline method for the species assignment task (from 65.8-81.3% in accuracy). The source code and data for species assignment are freely available. Database URL https://github.com/ncbi/SpeciesAssignment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database : the journal of biological databases and curation	Publication Date: Oct 13, 2022
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Assigning species information to corresponding genes by a sequence labeling framework.

Abstract

Talk to us

Similar Papers

More From: Database : the journal of biological databases and curation

Lead the way for us

Similar Papers

Semi-supervised training for conditional random fields with pseudo auxiliary task
Jie Liu ... Yalou Huang
-
Jie Liu, et. al.Jie Liu ... Yalou Huang
01 Jul 2011
01 Jul 2011

Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge
Martin Krallinger ... Florian Leitner
Genome Biology | VOL. 9
Martin Krallinger, et. al.Martin Krallinger ... Florian Leitner
01 Jan 2008
Genome Biology | VOL. 9

Integration of gene normalization stages and co-reference resolution using a Markov logic network
Hong-Jie Dai ... Wen−Lian Hsu
Bioinformatics | VOL. 27
Hong-Jie Dai, et. al.Hong-Jie Dai ... Wen−Lian Hsu
17 Jun 2011
Bioinformatics | VOL. 27

A Statistical Language Model for Pre-Trained Sequence Labeling: A Case Study on Vietnamese
Xianwen Liao ... Yongzhong Huang
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 21
Xianwen Liao, et. al.Xianwen Liao ... Yongzhong Huang
13 Dec 2021
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assigning species information to corresponding genes by a sequence labeling framework.

Abstract

Talk to us

Similar Papers

More From: Database : the journal of biological databases and curation