Abstract

BackgroundNext Generation Sequencing (NGS) technology generates tens of millions of short reads for each DNA/RNA sample. A key step in NGS data analysis is the short read alignment of the generated sequences to a reference genome. Although storing alignment information in the Sequence Alignment/Map (SAM) or Binary SAM (BAM) format is now standard, biomedical researchers still have difficulty accessing this information.ResultsWe have developed a Graphical User Interface (GUI) software tool named SAMMate. SAMMate allows biomedical researchers to quickly process SAM/BAM files and is compatible with both single-end and paired-end sequencing technologies. SAMMate also automates some standard procedures in DNA-seq and RNA-seq data analysis. Using either standard or customized annotation files, SAMMate allows users to accurately calculate the short read coverage of genomic intervals. In particular, for RNA-seq data SAMMate can accurately calculate the gene expression abundance scores for customized genomic intervals using short reads originating from both exons and exon-exon junctions. Furthermore, SAMMate can quickly calculate a whole-genome signal map at base-wise resolution allowing researchers to solve an array of bioinformatics problems. Finally, SAMMate can export both a wiggle file for alignment visualization in the UCSC genome browser and an alignment statistics report. The biological impact of these features is demonstrated via several case studies that predict miRNA targets using short read alignment information files.ConclusionsWith just a few mouse clicks, SAMMate will provide biomedical researchers easy access to important alignment information stored in SAM/BAM files. Our software is constantly updated and will greatly facilitate the downstream analysis of NGS data. Both the source code and the GUI executable are freely available under the GNU General Public License at http://sammate.sourceforge.net.

Highlights

  • Generation Sequencing (NGS) technology generates tens of millions of short reads for each DNA/RNA sample

  • Using Sequence Alignment/Map (SAM)/Binary SAM (BAM) files generated from short read alignments, SAMMate implements an efficient and fast algorithm to calculate a base-wise signal map

  • SAMMate is able to use short reads originating from both exons and exon-exon junctions to accurately calculate gene expression scores

Read more

Summary

Results

Overview Using the standard reference genome annotation files, SAMMate allows users to accurately calculate the gene expression abundance scores for all annotated genes using RNA-seq data. Algorithmic and computational contributions SAMMate uses a novel mapping and sorting strategy to create an ultrafast, efficient calculation of gene expression abundance scores as well as generating wiggle files for visualization. Biological case study: Comparing gene expression scores generated using SAMMate, TopHat and Novoalign to predict miRNA targets We have studied a pair of control and treatment transcriptomes. A signal map is a common input for a number of frequently performed sequential analyses to detect a wide range of Figure 6 Comparison of RPKM Gene Expression Scores Reported by SAMMate, Novoalign and Tophat. Key Feature: Generating wiggle files for visualization Biomedical researchers need to visualize the alignment results stored in SAM files in order to examine possible gene structure alterations between case and control studies. Key Feature: Generating an alignment report Short read alignment statistics provide indispensable resources to examine the alignment quality as well as

Conclusions
Background
Conclusion
10. Laird PW
Mardis ER
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call