Abstract

Over 40% of proteins in any eukaryotic genome encode intrinsically disordered regions (IDRs) that do not adopt defined tertiary structures. Certain IDRs perform critical functions, but discovering them is non‐trivial as the biological context determines their function. We present IDR‐Screen, a framework to discover functional IDRs in a high‐throughput manner by simultaneously assaying large numbers of DNA sequences that code for short disordered sequences. Functionality‐conferring patterns in their protein sequence are inferred through statistical learning. Using yeast HSF1 transcription factor‐based assay, we discovered IDRs that function as transactivation domains (TADs) by screening a random sequence library and a designed library consisting of variants of 13 diverse TADs. Using machine learning, we find that segments devoid of positively charged residues but with redundant short sequence patterns of negatively charged and aromatic residues are a generic feature for TAD functionality. We anticipate that investigating defined sequence libraries using IDR‐Screen for specific functions can facilitate discovering novel and functional regions of the disordered proteome as well as understand the impact of natural and disease variants in disordered segments.

Highlights

  • Understanding how the amino acid sequence of a protein contributes to its function is a problem of long-standing interest

  • In addition to the DNA binding domain that binds to the promoter DNA, transcription factors (TFs) harbor transactivation domains (TAD), which are typically less than 20 residues and intrinsically disordered (Sigler, 1988)

  • The current mechanistic model is that TAD mediates interactions to recruit the transcriptional machinery, which is critical for transcription initiation (Ptashne & Gann, 1997)

Read more

Summary

Introduction

Understanding how the amino acid sequence of a protein contributes to its function (sequence–function relationship) is a problem of long-standing interest. With the availability of genomes, it has become clear that a large fraction of any eukaryotic proteome encodes protein segments that do not autonomously fold into a defined tertiary structure they may contain secondary structural elements (van der Lee et al, 2014). Proteins typically use their intrinsically disordered regions (IDRs) to perform their function by mediating transient protein interactions (Tompa et al, 2014; Van Roey et al, 2014). Computational approaches have estimated that there could be up to a million functional IDRs in the human proteome (Tompa et al, 2014), only a small fraction of them have been characterized so far (Gouw et al, 2017), limiting our understanding of the disordered proteome

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call