Abstract

BackgroundUnderstanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge.ResultsWe have developed ngs.plot – a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready.ConclusionsWe conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.

Highlights

  • Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome

  • Its ability to produce more than one billion sequencing reads within the timeframe of a few days [1] has enabled the investigation of tens of thousands of biological events in parallel [2,3]

  • Designing a genome browser that can effectively manage the enormous amount of genomic information has become an important research topic in the past decade with dozens of tools being developed to date [6,7,8]

Read more

Summary

Background

Generation sequencing (NGS) technology has become the de facto indispensable tool to study genomics and epigenomics in recent years. As sequencing output has increased rapidly in recent years (which inevitably creates values at originally zerovalue regions), this strategy soon became a major problem: the RLE files grew too large and consumed a lot of memory during loading Another challenge arose when dealing with epigenomic marks that have broad patterns of enrichment – the coverage vectors are dense and may consume a lot of memory. Gene deserts, pericentromeres and subtelomeres are used to build a genome package for the “region analysis” utility (https://github.com/shenlabsinai/region_analysis) on the fly, which is used to perform location-based classifications on CGIs and DHSs. In total, more than 60 million functional elements have been incorporated into ngs.plot’s database so far (Table 1). The differential chromatin modification sites were detected by diffReps [42] using default parameters and the FDR cutoff was set as 0.1

Results and discussion
Conclusion
Metzker ML
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.