The accurate characterization of triplet repeats, especially the overrepresented CAG repeats, is increasingly relevant for several reasons. First, germline expansion of CAG repeats above a gene-specific threshold causes multiple neurodegenerative disorders; for instance, Huntington's disease (HD) is triggered by >36 CAG repeats in the huntingtin (HTT) gene. Second, extreme expansions up to 800 CAG repeats have been found in specific cell types affected by the disease. Third, synonymous single nucleotide variants within the CAG repeat stretch influence the age of disease onset. Thus, new sequencing-based protocols that profile both the length and the exact nucleotide sequence of triplet repeats are crucial. Various strategies to enrich the target gene over the background, along with sequencing platforms and bioinformatic pipelines, are under development. This review discusses the concepts, challenges, and methodological opportunities for analyzing triplet repeats, using HD as a case study. Starting with traditional approaches, we will explore how sequencing-based methods have evolved to meet increasing scientific demands. We will also highlight experimental and bioinformatic challenges, aiming to provide a guide for accurate triplet repeat characterization for diagnostic and therapeutic purposes.
Read full abstract