More than 90% of the world’s data has been created in the 2013-2014 years, and the pace is accelerating. Researchers have three main choices for building knowledgebases derived from the burgeoning literature: reading the articles (or abstracts) themselves, manual curation by experts, or using a specialized automated text-mining tool. 42% of highly cited papers appear in journals that are not traditionally highly cited. In 57% of cases, important information is mentioned only in the article itself and not in the abstract. Even the best human curators are not perfectly accurate. In one study using manual annotation of PubMed articles with Gene Ontology Annotation terms, only 39% of the terms assigned by three different curators were identical. In another study, the average precision of annotating medical events in clinical narratives by three experts was reported to be 88%. Between 2003-2012, more than one third of co-occurrences appeared in the body of an article prior to being published in abstracts. The current version of Elsevier’s text-mining technology has a 98% accuracy rate for entity detection and an 88% accuracy rate for relationship extraction. One of the most popular tools for the text mining and the presentation of results is Elsevier’s Pathway Studio. The number of articles researchers mentioned Pathway Studio is growing every year since 2003, reaching 170 articles in 2014. The number of citations of articles based on abstracts is also growing since 2004, reaching 153 citations in 2014. Also every year since 2006, a number of patents is growing, reaching 20 patents in 2014. The combination of content from top-quality journals, from Elsevier, coupled with proprietary automated text-mining technology that can process and extract critical information from literally millions of full-text scientific articles and tens of millions of abstracts in a matter of hours, provides a compelling competitive advantage for researcher's work.
Read full abstract