Abstract

Large-scale sequencing techniques to chart genomes are entirely consolidated. Stable computational methods to perform primary tasks such as quality control, read mapping, peak calling, and counting are likewise available. However, there is a lack of uniform standards for graphical data mining, which is also of central importance. To fill this gap, we developed SeqCode, an open suite of applications that analyzes sequencing data in an elegant but efficient manner. Our software is a portable resource written in ANSI C that can be expected to work for almost all genomes in any computational configuration. Furthermore, we offer a user-friendly front-end web server that integrates SeqCode functions with other graphical analysis tools. Our analysis and visualization toolkit represents a significant improvement in terms of performance and usability as compare to other existing programs. Thus, SeqCode has the potential to become a key multipurpose instrument for high-throughput professional analysis; further, it provides an extremely useful open educational platform for the world-wide scientific community. SeqCode website is hosted at http://ldicrocelab.crg.eu, and the source code is freely distributed at https://github.com/eblancoga/seqcode.

Highlights

  • Large-scale sequencing techniques to chart genomes are entirely consolidated

  • Powerful bioinformatic tools are available to manage this volume of data at a primary stage: (i) quality control profilers evaluate distinct scoring metrics on raw i­nformation[4,5,6]; (ii) mapping algorithms identify the location of each read on the g­ enome[7,8,9]; (iii) peak callers find clusters of reads significantly enriched in certain genomic regions in the sample map f­ile[10,11,12]; (iv) genome browsers are useful to visualize genome-wide binding profiles and p­ eaks[13,14,15,16]; and (v) other auxiliary applications convert intermediate files into the appropriate data ­formats[17,18,19,20]

  • Information on a particular genome assembly is loaded from two external files that must be supplied by the user: (i) the chromosome size file (ChromInfo.txt), and (ii) the gene transcript annotations, as provided by the RefSeq c­ onsortium[26]

Read more

Summary

Introduction

Large-scale sequencing techniques to chart genomes are entirely consolidated. Stable computational methods to perform primary tasks such as quality control, read mapping, peak calling, and counting are likewise available. There is a lack of uniform standards for graphical data mining, which is of central importance To fill this gap, we developed SeqCode, an open suite of applications that analyzes sequencing data in an elegant but efficient manner. We offer a user-friendly front-end web server that integrates SeqCode functions with other graphical analysis tools. Current high-throughput sequencing techniques (e.g. ChIP-seq, ATAC-seq, and RNA-seq) can use a single run to identify the repertoire of functional characteristics of the genome. We first illustrate the main characteristics of SeqCode, introduce the collection of principal SeqCode features to perform high-quality graphical analysis of sequencing data, and propose a standardized nomenclature of representations. SeqCode is entirely focused on the graphical analysis of 1D genomic data (e.g. ChIP-seq, RNA-seq). We comprehensively review the existing literature on similar tools to evaluate our software in comparison to current approaches

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call