Abstract

Hit selection from high-throughput assays remains a critical bottleneck in realizing the potential of omic-scale studies in biology. Widely used methods such as setting of cutoffs, prioritizing pathway enrichments, or incorporating predicted network interactions offer divergent solutions yet are associated with critical analytical trade-offs. The specific limitations of these individual approaches and the lack of a systematic way by which to integrate their rankings have contributed to limited overlap in the reported results from comparable genome-wide studies and costly inefficiencies in secondary validation efforts. Using comparative analysis of parallel independent studies as a benchmark, we characterize the specific complementary contributions of each approach and demonstrate an optimal framework to integrate these methods. We describe selection by iterative pathway group and network analysis looping (SIGNAL), an integrated, iterative approach that uses both pathway and network methods to optimize gene prioritization. SIGNAL is accessible as a rapid user-friendly web-based application (https://signal.niaid.nih.gov). A record of this paper's transparent peer review is included in the Supplemental information.

Highlights

  • High-throughput approaches—such as RNA and CRISPRbased screens, next-generation sequencing methods, and proteomic analysis—permit the unbiased measurement of the contribution of each gene in the genome to the outcome of a specific biological process; these methods continue to be some of the most powerful tools in research biology (Heckl and Charpentier, 2015; Gilbert et al, 2014; Moffat et al, 2006; Lee et al, 2003)

  • Building on recent advances in statistical normalization methods such as edgeR (McCarthy et al, 2012), DESeq2 (Love et al, 2014), and MAGECK (Li et al, 2014), widely applied bioinformatic approaches following data normalization and candidate ranking can be categorized into three major classes: optimizing the setting of cutoffs, prioritizing based on the representation of preset gene groups or pathways, and expanding the list of hits based on predicted interaction networks (Birmingham et al, 2009; Tseng et al, 2012). These methods provide differing solutions to the challenge of candidate prioritization, their corrective approaches are often associated with analytical trade-offs relating to error correction, novelty identification, and interpretability. In addition to these challenges, two critical gaps persist: the absence of a systematic way by which these solutions can be collectively utilized such that the greatest additive benefit to hit selection accuracy is accrued and analysis of challenges for experimentalists who may lack the computational expertise required for their implementation

  • The three independent studies of essential proteins required for early infection of HIV, known as HIV host dependency factors (HDFs), are among the most frequently cited examples of the high discordance of hit identification between parallel high-throughput assays (Hirsch, 2010; Zhu et al, 2014)

Read more

Summary

Introduction

High-throughput approaches—such as RNA and CRISPRbased screens, next-generation sequencing methods, and proteomic analysis—permit the unbiased measurement of the contribution of each gene in the genome to the outcome of a specific biological process; these methods continue to be some of the most powerful tools in research biology (Heckl and Charpentier, 2015; Gilbert et al, 2014; Moffat et al, 2006; Lee et al, 2003). Building on recent advances in statistical normalization methods such as edgeR (McCarthy et al, 2012), DESeq (Love et al, 2014), and MAGECK (Li et al, 2014), widely applied bioinformatic approaches following data normalization and candidate ranking can be categorized into three major classes: optimizing the setting of cutoffs, prioritizing based on the representation of preset gene groups or pathways, and expanding the list of hits based on predicted interaction networks (Birmingham et al, 2009; Tseng et al, 2012) These methods provide differing solutions to the challenge of candidate prioritization, their corrective approaches are often associated with analytical trade-offs relating to error correction, novelty identification, and interpretability. In addition to these challenges, two critical gaps persist: the absence of a systematic way by which these solutions can be collectively utilized such that the greatest additive benefit to hit selection accuracy is accrued and analysis of challenges for experimentalists who may lack the computational expertise required for their implementation

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.