Advanced Query Research Articles

Today, molecular biology databases are the cornerstone of knowledge sharing for life and health sciences. The curation and maintenance of these resources are labour intensive. Although text mining is gaining impetus among curators, its integration in curation workflow has not yet been widely adopted. The Swiss Institute of Bioinformatics Text Mining and CALIPHO groups joined forces to design a new curation support system named nextA5. In this report, we explore the integration of novel triage services to support the curation of two types of biological data: protein–protein interactions (PPIs) and post-translational modifications (PTMs). The recognition of PPIs and PTMs poses a special challenge, as it not only requires the identification of biological entities (proteins or residues), but also that of particular relationships (e.g. binding or position). These relationships cannot be described with onto-terminological descriptors such as the Gene Ontology for molecular functions, which makes the triage task more challenging. Prioritizing papers for these tasks thus requires the development of different approaches. In this report, we propose a new method to prioritize articles containing information specific to PPIs and PTMs. The new resources (RESTful APIs, semantically annotated MEDLINE library) enrich the neXtA5 platform. We tuned the article prioritization model on a set of 100 proteins previously annotated by the CALIPHO group. The effectiveness of the triage service was tested with a dataset of 200 annotated proteins. We defined two sets of descriptors to support automatic triage: the first set to enrich for papers with PPI data, and the second for PTMs. All occurrences of these descriptors were marked-up in MEDLINE and indexed, thus constituting a semantically annotated version of MEDLINE. These annotations were then used to estimate the relevance of a particular article with respect to the chosen annotation type. This relevance score was combined with a local vector-space search engine to generate a ranked list of PMIDs. We also evaluated a query refinement strategy, which adds specific keywords (such as ‘binds’ or ‘interacts’) to the original query. Compared to PubMed, the search effectiveness of the nextA5 triage service is improved by 190% for the prioritization of papers with PPIs information and by 260% for papers with PTMs information. Combining advanced retrieval and query refinement strategies with automatically enriched MEDLINE contents is effective to improve triage in complex curation tasks such as the curation of protein PPIs and PTMs. Database URL: http://candy.hesge.ch/nextA5

Read full abstract

BackgroundMicroRNAs (miRNA) are short nucleotides that interact with their target genes through 3′ untranslated regions (UTRs). The Cancer Genome Atlas (TCGA) harbors an increasing amount of cancer genome data for both tumor and normal samples. However, there are few visualization tools focusing on concurrently displaying important relationships and attributes between miRNAs and mRNAs of both cancer tumor and normal samples. Moreover, a deep investigation of miRNA-mRNA target and biological relationships across multiple cancer types by integrating web-based analysis has not been thoroughly conducted.ResultsWe developed an interactive visualization tool called MMiRNA-Viewer that can concurrently present the co-relationships of expression between miRNA-mRNA pairs of both tumor and normal samples into a single graph. The input file of MMiRNA-Viewer contains the expression information including fold changes between normal and tumor samples for mRNAs and miRNAs, the correlation between mRNA and miRNA, and the predicted target relationship by a number of databases. Users can also load their own input data into MMiRNA-Viewer and visualize and compare detailed information about cancer-related gene expression changes, and also changes in the expression of transcription-regulating miRNAs.To validate the MMiRNA-Viewer, eight types of TCGA cancer datasets with both normal and control samples were selected in this study and three filter steps were applied subsequently. We performed Gene Ontology (GO) analysis for genes available in final selected 238 pairs and also for genes in the top 5 % (95 percentile) for each of eight cancer types to report a significant number of genes involved in various biological functions and pathways. We also calculated various centrality measurement matrices for the largest connected component(s) in each of eight cancers and reported top genes and miRNAs with high centrality measurements.ConclusionsWith its user-friendly interface, dynamic visualization and advanced queries, we also believe MMiRNA-Viewer offers an intuitive approach for visualizing and elucidating co-relationships between miRNAs and mRNAs of both tumor and normal samples. We suggest that miRNA and mRNA pairs with opposite fold changes of their expression and with inverted correlation values between tumor and normal samples might be most relevant for explaining the decoupling of mRNAs and their targeting miRNAs in tumor samples for certain cancer types.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1219-y) contains supplementary material, which is available to authorized users.

Read full abstract

Advanced Query Research Articles

Related Topics

Articles published on Advanced Query

Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer's Disease: A Systematic Review.

TS Corpus Project: An online Turkish Dictionary and TS DIY Corpus

Fractal: An advanced multidimensional range query lookup protocol on nested rings for distributed systems

Real-time social media retrieval with spatial, temporal and social constraints

Evaluating Well-Formedness Constraints on Incomplete Models

Triage by ranking to support the curation of protein interactions.

Informatization of Education: Pedagogical Services of the University Library

The antiSMASH database, a comprehensive database of microbial secondary metabolite biosynthetic gene clusters.

Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration.

Dissecting the biological relationship between TCGA miRNA and mRNA sequencing data using MMiRNA-Viewer.

EpiGeNet: A Graph Database of Interdependencies Between Genetic and Epigenetic Events in Colorectal Cancer.

An ICT-Based Platform to Monitor Protocols in the Healthcare Environment.

Kodiak

Efficient monochromatic and bichromatic probabilistic reverse top-k query processing for uncertain big data

Quantifying the Connectivity of a Semantic Warehouse and Understanding its Evolution over Time

A Webgis Framework for Disseminating Processed Remotely Sensed on Land Cover Transformations

Design and development of semantic web-based system for computer science domain-specific information retrieval

Methods to Enhance Transformation in Near Real Time ETL

W-tree

SAFQuery: a simple and flexible advanced Web search interface

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Advanced Query Research Articles

Related Topics

Articles published on Advanced Query

Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer's Disease: A Systematic Review.

TS Corpus Project: An online Turkish Dictionary and TS DIY Corpus

Fractal: An advanced multidimensional range query lookup protocol on nested rings for distributed systems

Real-time social media retrieval with spatial, temporal and social constraints

Evaluating Well-Formedness Constraints on Incomplete Models

Triage by ranking to support the curation of protein interactions.

Informatization of Education: Pedagogical Services of the University Library

The antiSMASH database, a comprehensive database of microbial secondary metabolite biosynthetic gene clusters.

Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration.

Dissecting the biological relationship between TCGA miRNA and mRNA sequencing data using MMiRNA-Viewer.

EpiGeNet: A Graph Database of Interdependencies Between Genetic and Epigenetic Events in Colorectal Cancer.

An ICT-Based Platform to Monitor Protocols in the Healthcare Environment.

Kodiak

Efficient monochromatic and bichromatic probabilistic reverse top-k query processing for uncertain big data

Quantifying the Connectivity of a Semantic Warehouse and Understanding its Evolution over Time

A Webgis Framework for Disseminating Processed Remotely Sensed on Land Cover Transformations

Design and development of semantic web-based system for computer science domain-specific information retrieval

Methods to Enhance Transformation in Near Real Time ETL

W-tree

SAFQuery: a simple and flexible advanced Web search interface