Abstract

Since labor intensive and time consuming issue, manual curation in metabolic information extraction currently was replaced by text mining (TM). While TM in metabolic domain has been attempted previously, it is still challenging due to variety of specific terms and their meanings in different contexts. Named Entity Recognition (NER) generally used to identify interested keyword (protein and metabolite terms) in sentence, this preliminary task therefore highly influences the performance of metabolic TM framework. Conditional Random Fields (CRFs) NER has been actively used during a last decade, because it explicitly outperforms other approaches. However, an efficient CRFs-based NER depends purely on a quality of corpus which is a nontrivial task to produce. This paper introduced a hybrid solution which combines CRFs-based NER, dictionary usage, and complementary modules (constructed from existing corpus) in order to improve the performance of metabolic NER and another similar domain.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call