Abstract
Large-scale sequencing techniques to chart genomes are entirely consolidated. Stable computational methods to perform primary tasks such as quality control, read mapping, peak calling, and counting are likewise available. However, there is a lack of uniform standards for graphical data mining, which is also of central importance. To fill this gap, we developed SeqCode, an open suite of applications that analyzes sequencing data in an elegant but efficient manner. Our software is a portable resource written in ANSI C that can be expected to work for almost all genomes in any computational configuration. Furthermore, we offer a user-friendly front-end web server that integrates SeqCode functions with other graphical analysis tools. Our analysis and visualization toolkit represents a significant improvement in terms of performance and usability as compare to other existing programs. Thus, SeqCode has the potential to become a key multipurpose instrument for high-throughput professional analysis; further, it provides an extremely useful open educational platform for the world-wide scientific community. SeqCode website is hosted at http://ldicrocelab.crg.eu, and the source code is freely distributed at https://github.com/eblancoga/seqcode.
Highlights
Large-scale sequencing techniques to chart genomes are entirely consolidated
Powerful bioinformatic tools are available to manage this volume of data at a primary stage: (i) quality control profilers evaluate distinct scoring metrics on raw information[4,5,6]; (ii) mapping algorithms identify the location of each read on the g enome[7,8,9]; (iii) peak callers find clusters of reads significantly enriched in certain genomic regions in the sample map file[10,11,12]; (iv) genome browsers are useful to visualize genome-wide binding profiles and p eaks[13,14,15,16]; and (v) other auxiliary applications convert intermediate files into the appropriate data formats[17,18,19,20]
Information on a particular genome assembly is loaded from two external files that must be supplied by the user: (i) the chromosome size file (ChromInfo.txt), and (ii) the gene transcript annotations, as provided by the RefSeq c onsortium[26]
Summary
Large-scale sequencing techniques to chart genomes are entirely consolidated. Stable computational methods to perform primary tasks such as quality control, read mapping, peak calling, and counting are likewise available. There is a lack of uniform standards for graphical data mining, which is of central importance To fill this gap, we developed SeqCode, an open suite of applications that analyzes sequencing data in an elegant but efficient manner. We offer a user-friendly front-end web server that integrates SeqCode functions with other graphical analysis tools. Current high-throughput sequencing techniques (e.g. ChIP-seq, ATAC-seq, and RNA-seq) can use a single run to identify the repertoire of functional characteristics of the genome. We first illustrate the main characteristics of SeqCode, introduce the collection of principal SeqCode features to perform high-quality graphical analysis of sequencing data, and propose a standardized nomenclature of representations. SeqCode is entirely focused on the graphical analysis of 1D genomic data (e.g. ChIP-seq, RNA-seq). We comprehensively review the existing literature on similar tools to evaluate our software in comparison to current approaches
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.