Gold Standard Set Research Articles

Biological networks catalog the complex web of interactions happening between different molecules, typically proteins, within a cell. These networks are known to be highly modular, with groups of proteins associated with specific biological functions. Human diseases often arise from the dysfunction of one or more such proteins of the biological functional group. The ability, to identify and automatically extract these modules has implications for understanding the etiology of different diseases as well as the functional roles of different protein modules in disease. The recent DREAM challenge posed the problem of identifying disease modules from six heterogeneous networks of proteins/genes. There exist many community detection algorithms, but all of them are not adaptable to the biological context, as these networks are densely connected and the size of biologically relevant modules is quite small. The contribution of this study is 3-fold: first, we present a comprehensive assessment of many classic community detection algorithms for biological networks to identify non-overlapping communities, and propose heuristics to identify small and structurally well-defined communities—core modules. We evaluated our performance over 180 GWAS datasets. In comparison to traditional approaches, with our proposed approach we could identify 50% more number of disease-relevant modules. Thus, we show that it is important to identify more compact modules for better performance. Next, we sought to understand the peculiar characteristics of disease-enriched modules and what causes standard community detection algorithms to detect so few of them. We performed a comprehensive analysis of the interaction patterns of known disease genes to understand the structure of disease modules and show that merely considering the known disease genes set as a module does not give good quality clusters, as measured by typical metrics such as modularity and conductance. We go on to present a methodology leveraging these known disease genes, to also include the neighboring nodes of these genes into a module, to form good quality clusters and subsequently extract a “gold-standard set” of disease modules. Lastly, we demonstrate, with justification, that “overlapping” community detection algorithms should be the preferred choice for disease module identification since several genes participate in multiple biological functions.

Read full abstract

BackgroundCilia are specialized, hair-like structures that project from the cell bodies of eukaryotic cells. With increased understanding of the distribution and functions of various types of cilia, interest in these organelles is accelerating. To effectively use this great expansion in knowledge, this information must be made digitally accessible and available for large-scale analytical and computational investigation. Capture and integration of knowledge about cilia into existing knowledge bases, thus providing the ability to improve comparative genomic data analysis, is the objective of this work.MethodsWe focused on the capture of information about cilia as studied in the laboratory mouse, a primary model of human biology. The workflow developed establishes a standard for capture of comparative functional data relevant to human biology. We established the 310 closest mouse orthologs of the 302 human genes defined in the SYSCILIA Gold Standard set of ciliary genes. For the mouse genes, we identified biomedical literature for curation and used Gene Ontology (GO) curation paradigms to provide functional annotations from these publications.ResultsEmploying a methodology for comprehensive capture of experimental data about cilia genes in structured, digital form, we established a workflow for curation of experimental literature detailing molecular function and roles of cilia proteins starting with the mouse orthologs of the human SYSCILIA gene set. We worked closely with the GO Consortium ontology development editors and the SYSCILIA Consortium to improve the representation of ciliary biology within the GO. During the time frame of the ontology improvement project, we have fully curated 134 of these 310 mouse genes, resulting in an increase in the number of ciliary and other experimental annotations.ConclusionsWe have improved the GO annotations available for mouse genes orthologous to the human genes in the SYSCILIA Consortium’s Gold Standard set. In addition, ciliary terminology in the GO itself was improved in collaboration with GO ontology developers and the SYSCILIA Consortium. These improvements to the GO terms for the functions and roles of ciliary proteins, along with the increase in annotations of the corresponding genes, enhance the representation of ciliary processes and localizations and improve access to these data during large-scale bioinformatic analyses.

Read full abstract

Gold Standard Set Research Articles

Related Topics

Articles published on Gold Standard Set

Machine learning for phenotyping opioid overdose events.

The Embase UK filter: validation of a geographic search filter to retrieve research about the UK from OVID Embase.

Powerful gene set analysis in GWAS with the Generalized Berk-Jones statistic.

Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks.

Using SNOMED CT-encoded problems to improve ICD-10-CM coding—A randomized controlled experiment

Integrating genomic resources to present full gene and putative promoter capture probe sets for bread wheat

Automatic identification of relevant chemical compounds from patents.

Interrogation of genome-wide networks in biology: comparison of knowledge-based and statistical methods

Estimating post-editing time using a gold-standard set of machine translation errors

Development and validation of the PEPPER framework (Prenatal Exposure PubMed ParsER) with applications to food additives.

Context-specific interactions in literature-curated protein interaction databases

Development and validation of a search filter to identify equity-focused studies: reducing the number needed to screen

DynBench3D, a Web-Resource to Dynamically Generate Benchmark Sets of Large Heteromeric Protein Complexes

Harnessing citizen science through mobile phone technology to screen for immunohistochemical biomarkers in bladder cancer

Verifying Conceptual Domain Models with Human Computation: A Case Study in Software Engineering

Drug Repositioning by Integrating Known Disease-Gene and Drug-Target Associations in a Semi-supervised Learning Model.

Sensing the cilium, digital capture of ciliary data for comparative genomics investigations

The best choice of equipment to obtain high quality standardised results in intra-oral photography – a comparison between the common practice in the UK and the gold standard set by the literature

Maize GO Annotation-Methods, Evaluation, and Review (maize-GAMER).

A pilot study: a teaching electronic medical record for educating and assessing residents in the care of patients

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Gold Standard Set Research Articles

Related Topics

Articles published on Gold Standard Set

Machine learning for phenotyping opioid overdose events.

The Embase UK filter: validation of a geographic search filter to retrieve research about the UK from OVID Embase.

Powerful gene set analysis in GWAS with the Generalized Berk-Jones statistic.

Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks.

Using SNOMED CT-encoded problems to improve ICD-10-CM coding—A randomized controlled experiment

Integrating genomic resources to present full gene and putative promoter capture probe sets for bread wheat

Automatic identification of relevant chemical compounds from patents.

Interrogation of genome-wide networks in biology: comparison of knowledge-based and statistical methods

Estimating post-editing time using a gold-standard set of machine translation errors

Development and validation of the PEPPER framework (Prenatal Exposure PubMed ParsER) with applications to food additives.

Context-specific interactions in literature-curated protein interaction databases

Development and validation of a search filter to identify equity-focused studies: reducing the number needed to screen

DynBench3D, a Web-Resource to Dynamically Generate Benchmark Sets of Large Heteromeric Protein Complexes

Harnessing citizen science through mobile phone technology to screen for immunohistochemical biomarkers in bladder cancer

Verifying Conceptual Domain Models with Human Computation: A Case Study in Software Engineering

Drug Repositioning by Integrating Known Disease-Gene and Drug-Target Associations in a Semi-supervised Learning Model.

Sensing the cilium, digital capture of ciliary data for comparative genomics investigations

The best choice of equipment to obtain high quality standardised results in intra-oral photography – a comparison between the common practice in the UK and the gold standard set by the literature

Maize GO Annotation-Methods, Evaluation, and Review (maize-GAMER).

A pilot study: a teaching electronic medical record for educating and assessing residents in the care of patients