Abstract

The recent arrival of ultra-high throughput, next generation sequencing (NGS) technologies has revolutionized the genetics and genomics fields by allowing rapid and inexpensive sequencing of billions of bases. The rapid deployment of NGS in a variety of sequencing-based experiments has resulted in fast accumulation of massive amounts of sequencing data. To process this new type of data, a torrent of increasingly sophisticated algorithms and software tools are emerging to help the analysis stage of the NGS applications. In this article, we strive to comprehensively identify the critical challenges that arise from all stages of NGS data analysis and provide an objective overview of what has been achieved in existing works. At the same time, we highlight selected areas that need much further research to improve our current capabilities to delineate the most information possible from NGS data. The article focuses on applications dealing with ChIP-Seq and RNA-Seq.

Highlights

  • Much like the development of microarray technology for measuring gene expression in the late 1990s and early 2000s, the development of technologies for high-throughput sequencing, termed next-generation sequencing (NGS) technologies, is having an impact on the types of questions that biologists can ask these days

  • We focus on two types of experiments that can be done using the NGS technology

  • ChIP-seq are much sharper and narrower than those in ChIP-chip due to its superior resolution. For inference using both sources of data, Choi et al proposed a hierarchical hidden Markov model (HHMM) for an integrated analysis using both ChIP-chip and ChIP-Seq data [94]

Read more

Summary

Introduction

Much like the development of microarray technology for measuring gene expression in the late 1990s and early 2000s, the development of technologies for high-throughput sequencing, termed next-generation sequencing (NGS) technologies, is having an impact on the types of questions that biologists can ask these days Already, these technologies have resulted in a multitude of high-impact studies with very diverse biological applications. While the applications given so far have been applied to data from humans, NGS technologies have been applied to data from model organisms, such as yeast [18], bacteria [19,20] the mouse [5] and ancient species [21,22]. We hope that at the end of this article, statisticians, computer science researchers and data analysts have a better sense of the experiment that is performed to generate the data as well as issues involved in their analysis

Experimental Platform
Mapping reads from NGS experiments
Statistical methods for ChIP-Seq experiments
Follow–up analysis
Combining ChIP-Seq with ChIP-chip data
RNA-Seq experiments: measuring gene expression
Experimental design considerations
Findings
Conclusion and Future Directions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call