Abstract
BackgroundWhole-genome sequencing is performed routinely as a means to identify polymorphic genetic loci such as short tandem repeat loci. We have developed a simple tool, called pSTR Finder, which is freely available as a means of identifying putative polymorphic short tandem repeat (STR) loci from data generated from genome-wide sequences. The program performs cross comparisons on the STR sequences generated using the Tandem Repeats Finder based on multiple-genome samples in a FASTA format. These comparisons generate reports listing identical, polymorphic, and different STR loci when comparing two samples.MethodsThe web site http://forensic.mc.ntu.edu.tw:9000/PSTRWeb/Default has been developed as a means to identify polymorphic STR loci within complex mass genome sequences. The program was developed to generate a series of user-friendly reports.ResultsAs proof of concept for the program, four FASTA genome sequence samples of human chromosome X (AC_000155.1, CM000685.1, NC_018934.2, and CM000274.1) were obtained from GenBank and were analyzed for the presence of putative STR regions. The sequences within AC-000155.1 were used as an initial reference sequence from which there were 5443 identical and 4305 polymorphic STR loci identified using a repeat unit of 1–6 and 10 bp as the flanking sequence either side of the putative STR loci. A reliability test was used to compare five FASTA samples, which had sections of DNA sequence removed to mimic partial or fragmented DNA sequences, to determine whether pSTR Finder can efficiently and consistently find identical, polymorphic, and different STR loci.ConclusionsFrom the mass of DNA sequence data, the project was found to reproducibly identify polymorphic STR loci and generate user-friendly reports detailing the number and location of these potential polymorphic loci. This freely available program was found to be a useful tool to find polymorphic STR within whole-genome sequence data in forensic genetic studies.Electronic supplementary materialThe online version of this article (doi:10.1186/s13323-015-0027-x) contains supplementary material, which is available to authorized users.
Highlights
Whole-genome sequencing is performed routinely as a means to identify polymorphic genetic loci such as short tandem repeat loci
There are a number of software program developed for this purpose when dealing with data from massive parallel sequencing such as MyFLq [10], lobSTR [11], and RepeatFinder [12], while some programs are useful in identifying a putative short tandem repeat (STR), not all are designed to indicate whether the locus is polymorphic and able to pullout the flanking DNA
We have developed the abbreviation used for pSTR Finder (pSTR) Finder to efficiently analyze multiple-genome sequence samples for the presence of STR loci using Tandem Repeats Finder (TRF) [12]. pSTR accepts sample data in the FASTA format and utilizes TRF
Summary
Whole-genome sequencing is performed routinely as a means to identify polymorphic genetic loci such as short tandem repeat loci. We have developed a simple tool, called pSTR Finder, which is freely available as a means of identifying putative polymorphic short tandem repeat (STR) loci from data generated from genome-wide sequences. The program performs cross comparisons on the STR sequences generated using the Tandem Repeats Finder based on multiple-genome samples in a FASTA format. These comparisons generate reports listing identical, polymorphic, and different STR loci when comparing two samples. The pSTR program is designed to analyze all input samples to discover and record putative polymorphic STR loci, regardless of whether the input sample was complete, or fractions of, a genome. We have found this program to be highly efficient when screening for potential polymorphic STR loci from genome-wide sequences and a major improvement on the current situation such that polymorphic STR loci can be identified rapidly from a large dataset
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.