Automated annotation of functional imaging experiments via multi-label classification

Matthew D Turner,George F Luger,Chayan Chakrabarti,Angela R Laird,Jiawei F Xu,Thomas B Jones,Jessica A Turner,Peter T Fox

doi:10.3389/fnins.2013.00240

Abstract

Identifying the experimental methods in human neuroimaging papers is important for grouping meaningfully similar experiments for meta-analyses. Currently, this can only be done by human readers. We present the performance of common machine learning (text mining) methods applied to the problem of automatically classifying or labeling this literature. Labeling terms are from the Cognitive Paradigm Ontology (CogPO), the text corpora are abstracts of published functional neuroimaging papers, and the methods use the performance of a human expert as training data. We aim to replicate the expert's annotation of multiple labels per abstract identifying the experimental stimuli, cognitive paradigms, response types, and other relevant dimensions of the experiments. We use several standard machine learning methods: naive Bayes (NB), k-nearest neighbor, and support vector machines (specifically SMO or sequential minimal optimization). Exact match performance ranged from only 15% in the worst cases to 78% in the best cases. NB methods combined with binary relevance transformations performed strongly and were robust to overfitting. This collection of results demonstrates what can be achieved with off-the-shelf software components and little to no pre-processing of raw text.

Highlights

Scientific publication in cognitive neuroscience today is proceeding at an intense pace; a pubmed.gov search revealed that for the 4 year period 2009–2012, there were 5033 total publications tagged “human brain mapping,” with the number of publications between 2009 and 2012 increasing by 12% each year
We present the methods in more detail than is perhaps common in the text-mining community, in service of making these results more repeatable by others, and to present these methods to neuroimaging researchers interested in automated annotation who may not otherwise be aware of them
We directly compare the various combinations of problem transformation method and machine learning algorithm on the abstract alone corpus for each of the seven Cognitive Paradigm Ontology (CogPO) label dimensions

Summary

Introduction

Scientific publication in cognitive neuroscience today is proceeding at an intense pace; a pubmed.gov search revealed that for the 4 year period 2009–2012, there were 5033 total publications tagged “human brain mapping,” with the number of publications between 2009 and 2012 increasing by 12% each year. We are faced with a deluge of new results and publications across all fields every year (Howe et al, 2008). This has created problems for data warehousing, searching, and curation. This latter term refers to the acquisition, selection, annotation, and maintenance of digital information. The curation of this massive collection of scientific literature is a challenging problem. Controlled vocabularies limit language to terms with precise unitary meanings and ontologies replicate some of the logical structure of scientific language in a computable fashion, allowing researchers to more effectively search and process the scientific literature

Objectives

Methods

Results

Conclusion