Abstract

BackgroundShort tandem repeats (STRs) are abundant in human genomes. Numerous STRs have been shown to be associated with genetic diseases and gene regulatory functions, and have been selected as genetic markers for evolutionary and forensic analyses. High-throughput next generation sequencers have fostered new cutting-edge computing techniques for genome-scale analyses, and cross-genome comparisons have facilitated the efficient identification of polymorphic STR markers for various applications.ResultsAn automated and efficient system for detecting human polymorphic STRs at the genome scale is proposed in this study. Assembled contigs from next generation sequencing data were aligned and calibrated according to selected reference sequences. To verify identified polymorphic STRs, human genomes from the 1000 Genomes Project were employed for comprehensive analyses, and STR markers from the Combined DNA Index System (CODIS) and disease-related STR motifs were also applied as cases for evaluation. In addition, we analyzed STR variations for highly conserved homologous genes and human-unique genes. In total 477 polymorphic STRs were identified from 492 human-unique genes, among which 26 STRs were retrieved and clustered into three different groups for efficient comparison.ConclusionsWe have developed an online system that efficiently identifies polymorphic STRs and provides novel distinguishable STR biomarkers for different levels of specificity. Candidate polymorphic STRs within a personal genome could be easily retrieved and compared to the constructed STR profile through query keywords, gene names, or assembled contigs.

Highlights

  • Short tandem repeats (STRs) are abundant in human genomes

  • We have performed a statistical analysis of the STR distributions in several datasets including chromosomal genes, combined DNA index system (CODIS) genes, disease-related genes, cross-species homologous genes, and genes that are unique to humans

  • We evaluated the total length of STRs (TLSTR), total length of selected genes (TLgene), total number of genes (TNgene), total number of STRs (TNSTR), total number of polymorphic STRs (TNpSTR), density of polymorphic STRs, and occurrence ratio of polymorphic STRs in each chromosome

Read more

Summary

Introduction

Short tandem repeats (STRs) are abundant in human genomes. Numerous STRs have been shown to be associated with genetic diseases and gene regulatory functions, and have been selected as genetic markers for evolutionary and forensic analyses. High-throughput generation sequencers have fostered new cutting-edge computing techniques for genome-scale analyses, and cross-genome comparisons have facilitated the efficient identification of polymorphic STR markers for various applications. Recent publications have shown that NGS plays a low-cost and time-efficient role in polymorphic STR marker discovery, even without providing reference sequences [15]. The latest tools have focused on STR marker discovery through NGS read analysis. Hoffman and Nichols proposed a manual method for in silico STR marker screening [17]. We sought to develop an efficient identification system that is capable of detecting conserved and polymorphic STRs across different individual sequence reads. The proposed method could detect STR polymorphisms without curated procedures, and could be directly applied for the efficient identification of conserved and polymorphic STR markers and accelerate functional analysis of regulatory STR motifs

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call