Abstract

Short tandem repeats (STRs) are abundant in genomic sequences and are known for comparatively high mutation rates; STRs therefore are thought to be a potent source of genetic diversity. In protein-coding sequences STRs primarily encode disorder-promoting amino acids and are often located in intrinsically disordered regions (IDRs). STRs are frequently studied in the scope of microsatellite instability (MSI) in cancer, with little focus on the connection between protein STRs and IDRs. We believe, however, that this relationship should be explicitly included when ascertaining STR functionality in cancer. Here we explore this notion using all canonical human proteins from SwissProt, wherein we detected 3,699 STRs. Over 80% of these consisted completely of disorder promoting amino acids. 62.1% of amino acids in STR sequences were predicted to also be in an IDR, compared to 14.2% for non-repeat sequences. Over-representation analysis showed STR-containing proteins to be primarily located in the nucleus where they perform protein- and nucleotide-binding functions and regulate gene expression. They were also enriched in cancer-related signaling pathways. Furthermore, we found enrichments of STR-containing proteins among those correlated with patient survival for cancers derived from eight different anatomical sites. Intriguingly, several of these cancer types are not known to have a MSI-high (MSI-H) phenotype, suggesting that protein STRs play a role in cancer pathology in non MSI-H settings. Their intrinsic link with IDRs could therefore be an attractive topic of future research to further explore the role of STRs and IDRs in cancer. We speculate that our observations may be linked to the known dosage-sensitivity of disordered proteins, which could hint at a concentration-dependent gain-of-function mechanism in cancer for proteins containing STRs and IDRs.

Highlights

  • Short Tandem Repeats (STRs), known as microsatellites, are genomic motifs of 1–6 base pairs that are repeated back-to-back

  • Out of the STR-containing proteins, 85.5% contained at least one intrinsically disordered regions (IDRs) - not necessarily overlapping the STR. This was substantially lower for nonSTR proteins, where 50.8% of proteins were predicted to contain an IDR. 2,717 of all STRs were homorepeats consisting of repeating tracts of a single amino acid (AA)

  • We explored the occurrence of short tandem repeats and intrinsic disorder in a non-redundant set of human proteins spanning the proteome

Read more

Summary

Introduction

Short Tandem Repeats (STRs), known as microsatellites, are genomic motifs of 1–6 base pairs that are repeated back-to-back. STRs are estimated to make up around 3% of the complete human genome (Ellegren, 2004). They are highly polymorphic, with a mutation rate that is estimated to be several orders of magnitude higher than non-repeating sequence (Willems et al, 2014). The primary mode of mutation in STRs is their contraction or expansion by gain or loss of repeat units. The process that is mainly held responsible for this is replication slippage (Viguera et al, 2001). In this process, one of the two DNA strands ’slips’ during replication, forming a hairpin-like structure. Depending on which of the two strands slips, this can lead to either insertions or deletions of repeat units

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.