Event Abstract Back to Event BLUIMA, an NLP Pipeline for Neuroscience Renaud Richardet1*, Martin Telefont1, Jean-Cédric Chappelier2 and Sean L. Hill1 1 EPFL, SV, Switzerland 2 EPFL, Switzerland The growing number of published neuroscientific literature makes it impossible for researchers to read, manually curate and integrate the newly available information. There is a rich landscape of tools to automate knowledge extraction from scientific papers. However, this landscape is fragmented and lacks interoperability. In addition, some tools are focused on the biomedical domain, but few are specific to neuroscience. BLUIMA is an integrated suite of software components for natural language processing of neuroscientific literature (neuroNLP). BLUIMA is based on the high-performance Apache UIMA framework and provides UIMA components wrapping state-of-the-art NLP tools so they can be used interchangeably in processing pipelines. BLUIMA also includes original models and tools specific to neuroscience and provides corpus readers for neuroscientific corpora. Corpus readers are provided for several corpora (e.g. WhiteText brain regions corpus). A robust PDF reader module performs precise text extraction from scientific articles in PDF format. BLUIMA also includes pre-processing modules for sentence segmentation, word tokenization and part-of-speech tagging (JulieLab), as well as lemmatization (BioLemmatizer) and abbreviation recognition (BioAdi). The MongoDb module allows storing UIMA documents into MongoDb, the leading NoSQL database. Lexical-based named entity recognizers (NER) are available for organism name, age and sex, for brain regions, cell and subcellular names, protein and gene names. BLUIMA also provides a NER built using the NIFSTD brain ontology, and another using the BioLexicon, a lexical-terminological resource of nearly 2.2 Mio lexical entries from the biomedical domain. Finally, BLUIMA wraps several machine learning-based NERs for chemicals, species and proteins. The above components are packaged into a freely available, standalone software suite with minimal dependency. Furthermore, a simple scripting language allows configuring the different components in a simple and straightforward format. Evaluation was performed on a random sample of 10'000 PubMed abstracts containing the MeSH term "Neuroscience". On this dataset, 97.0% of all tokens were recognized and mapped by one or more components. In conclusion, BLUIMA is an effort to integrate available tools and develop new tools for the processing of neuroscientific literature. Keywords: Natural Language Processing, content extraction, UIMA, Neurosciences, neuroNLP Conference: Neuroinformatics 2013, Stockholm, Sweden, 27 Aug - 29 Aug, 2013. Presentation Type: Demo Topic: Infrastructural and portal services Citation: Richardet R, Telefont M, Chappelier J and Hill SL (2013). BLUIMA, an NLP Pipeline for Neuroscience. Front. Neuroinform. Conference Abstract: Neuroinformatics 2013. doi: 10.3389/conf.fninf.2013.09.00050 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 30 Apr 2013; Published Online: 11 Jul 2013. * Correspondence: Mr. Renaud Richardet, EPFL, SV, Lausanne, Switzerland, renaud.richardet@epfl.ch Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Renaud Richardet Martin Telefont Jean-Cédric Chappelier Sean L Hill Google Renaud Richardet Martin Telefont Jean-Cédric Chappelier Sean L Hill Google Scholar Renaud Richardet Martin Telefont Jean-Cédric Chappelier Sean L Hill PubMed Renaud Richardet Martin Telefont Jean-Cédric Chappelier Sean L Hill Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.
Read full abstract