Crowd enabled curation and querying of large and noisy text mined protein interaction data

Hasan M Jamil,Fereidoon Sadri

doi:10.1007/s10619-017-7209-x

Abstract

The abundance of mined, predicted and uncertain biological data warrant massive, efficient and scalable curation efforts. The human expertise required for any successful curation enterprise is often economically prohibitive, especially for speculative end user queries that ultimately may not bear fruit. So the challenge remains in devising a low cost engine capable of delivering fast but tentative annotation and curation of a set of data items that can later be authoritatively validated by experts demanding significantly smaller investment. The aim thus is to make a large volume of predicted data available for use as early as possible with an acceptable degree of confidence in their accuracy while the curation continues. In this paper, we present a novel approach to annotation and curation of biological database contents using crowd computing. The technical contribution is in the identification and management of trust of mechanical turks, and support for ad hoc declarative queries, both of which are leveraged to enable reliable analytics using noisy predicted interactions. While the proposed approach and the CrowdCure system are designed for literature mined protein-protein interaction data curation, they are amenable to substantial generalization.

Full Text