Automated recognition of brain region mentions in neuroscience literature.

Leon French

doi:10.3389/neuro.11.029.2009

Abstract

The ability to computationally extract mentions of neuroanatomical regions from the literature would assist linking to other entities within and outside of an article. Examples include extracting reports of connectivity or region-specific gene expression. To facilitate text mining of neuroscience literature we have created a corpus of manually annotated brain region mentions. The corpus contains 1,377 abstracts with 18,242 brain region annotations. Interannotator agreement was evaluated for a subset of the documents, and was 90.7% and 96.7% for strict and lenient matching respectively. We observed a large vocabulary of over 6,000 unique brain region terms and 17,000 words. For automatic extraction of brain region mentions we evaluated simple dictionary methods and complex natural language processing techniques. The dictionary methods based on neuroanatomical lexicons recalled 36% of the mentions with 57% precision. The best performance was achieved using a conditional random field (CRF) with a rich feature set. Features were based on morphological, lexical, syntactic and contextual information. The CRF recalled 76% of mentions at 81% precision, by counting partial matches recall and precision increase to 86% and 92% respectively. We suspect a large amount of error is due to coordinating conjunctions, previously unseen words and brain regions of less commonly studied organisms. We found context windows, lemmatization and abbreviation expansion to be the most informative techniques. The corpus is freely available at http://www.chibi.ubc.ca/WhiteText/.

Highlights

Bioinformatics has proven the value of databasing and formalizing knowledge
We show the top weights for the state transition of outside a brain region mention into inside one, which occurs at the first word of a brain region mention
The General Architecture for Text Engineering (GATE) tokenizer split the corpus into 17,247 sentences 461,552 tokens with 46,340 labelled as brain regions

Summary

Introduction

Bioinformatics has proven the value of databasing and formalizing knowledge. Much of the focus is on molecular biology but neuroscience researchers are taking note (French and Pavlidis, 2007). At least seeding, knowledge bases is text mining, or the automated extraction and formalization of information from free text sources such as the biomedical literature. There has been much interest in applying text mining to extracting information about genes and proteins. In the BioCreative 2 challenge, 44 teams competed to extract, resolve and link protein and gene mentions (Krallinger et al, 2008), and the methods work well enough to be of practical importance in creating databases (Leitner et al, 2008). There has been less work on how to apply such techniques to domain-specific knowledge in neuroscience

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Neuroinformatics	Publication Date: Jan 1, 2009
Citations: 22	License type: cc-by

R Discovery Prime

R Discovery Prime

Automated recognition of brain region mentions in neuroscience literature.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Neuroinformatics

Lead the way for us

Similar Papers

DNA Methylation Signatures within the Human Brain
Christine Ladd-Acosta ... Andrew P Feinberg
The American Journal of Human Genetics | VOL. 81
Christine Ladd-Acosta, et. al.Christine Ladd-Acosta ... Andrew P Feinberg
01 Dec 2007
The American Journal of Human Genetics | VOL. 81

Brain region-specific gene expression profiles in freshly isolated rat microglia.
Karlijn J Doorn ... John J P Brevã©
Frontiers in Cellular Neuroscience | VOL. 9
Karlijn J Doorn, et. al.Karlijn J Doorn ... John J P Brevã©
12 Mar 2015
Frontiers in Cellular Neuroscience | VOL. 9

A Hybrid DBN and CRF Model for Spectral-Spatial Classification of Hyperspectral Images
Ping Zhong ... Zhiqiang Gong
Statistics, Optimization & Information Computing | VOL. 5
Ping Zhong, et. al.Ping Zhong ... Zhiqiang Gong
01 Jun 2017
Statistics, Optimization & Information Computing | VOL. 5

A DBN-crf for spectral-spatial classification of hyperspectral data
Ping Zhong ... Carola-Bibiane Schonlieb
-
Ping Zhong, et. al.Ping Zhong ... Carola-Bibiane Schonlieb
01 Dec 2016
01 Dec 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated recognition of brain region mentions in neuroscience literature.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Neuroinformatics