Abstract
BackgroundScreening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. We aimed to develop an efficient computational workflow for reliable detection of STR expansions in next-generation sequencing data and demonstrate its clinical utility.MethodsWe characterized the performance of eight STR analysis methods (lobSTR, HipSTR, RepeatSeq, ExpansionHunter, TREDPARSE, GangSTR, STRetch, and exSTRa) on next-generation sequencing datasets of samples with known disease-causing full-mutation STR expansions and genomes simulated to harbor repeat expansions at selected loci and optimized their sensitivity. We then used a machine learning decision tree classifier to identify an optimal combination of methods for full-mutation detection. In Burrows-Wheeler Aligner (BWA)-aligned genomes, the ensemble approach of using ExpansionHunter, STRetch, and exSTRa performed the best (precision = 82%, recall = 100%, F1-score = 90%). We applied this pipeline to screen 301 families of children with suspected genetic disorders.ResultsWe identified 10 individuals with full-mutations in the AR, ATXN1, ATXN8, DMPK, FXN, or HTT disease STR locus in the analyzed families. Additional candidates identified in our analysis include two probands with borderline ATXN2 expansions between the established repeat size range for reduced-penetrance and full-penetrance full-mutation and seven individuals with FMR1 CGG repeats in the intermediate/premutation repeat size range. In 67 probands with a prior negative clinical PCR test for the FMR1, FXN, or DMPK disease STR locus, or the spinocerebellar ataxia disease STR panel, our pipeline did not falsely identify aberrant expansion. We performed clinical PCR tests on seven (out of 10) full-mutation samples identified by our pipeline and confirmed the expansion status in all, showing absolute concordance between our bioinformatics and molecular findings.ConclusionsWe have successfully demonstrated the application of a well-optimized bioinformatics pipeline that promotes the utility of genome-wide sequencing as a first-tier screening test to detect expansions of known disease STRs. Interrogating clinical next-generation sequencing data for pathogenic STR expansions using our ensemble pipeline can improve diagnostic yield and enhance clinical outcomes for patients with repeat expansion disorders.
Highlights
Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders
We have successfully demonstrated the application of a well-optimized bioinformatics pipeline that promotes the utility of genome-wide sequencing as a first-tier screening test to detect expansions of known disease STRs
We observed a similar trend among genotyped normal and intermediate/premutation alleles in Burrows-Wheeler Aligner (BWA) data, except that GangSTR did not correctly identify any intermediate/premutation alleles. As they only rely on spanning reads, lobSTR, HipSTR, and RepeatSeq did not detect any full-mutations in either Isaac- or BWA-aligned whole-genome sequencing (WGS) data
Summary
Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. Accurate genotyping (i.e., the determination of the number of copies of repeat units in an STR) is critical to the molecular diagnosis of STR expansion disorders as repeat length often influences disease severity and age of onset of clinical symptoms [3]. Repeat length determines an STR’s allelic class (normal, intermediate, premutation, or full-mutation), which may differ with respect to the associated disease phenotype [3, 4]. Expanded alleles tend to further increase in repeat length during intergenerational transmission, and as a result, genetic anticipation (the earlier and more severe manifestation of disease symptoms with each successive generation) is common in repeat expansion disorders [8]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.