Abstract

Hundreds of millions of dollars have been invested into gene regulation projects such as ENCODE, attempting to decipher the transcriptional regulation network in the human genome to interpret disease mechanisms. However, few labs have found ways to utilize this data into a molecular view of evolutionary biology or disease mechanisms. We have developed a workflow to combine data from ChIP-Seq with molecular models of transcription factor DNA interaction. The compilation of evolutionary sequences, modeling of protein-DNA interaction, structural DNA scanning of consensus binding motifs, and molecular dynamic simulations provides a biophysical model of TF-DNA interaction. This approach has elucidated several critical findings to date: 1). Evolution has selected on a unique double E-box sequence for TWIST DNA binding through a homo-heterodimer model with genetic variants for craniofacial syndromes shown to disrupt this mechanism; 2). Methyl-CpG binding domain (MBD) proteins are conserved from plants to animals for recognizing 5-methylcytosine with invertebrate species such as Drosophila having progressive development for loss-of-function variants; 3). Cancer variants enrich for functional DNA binding amino acids within SOX HMG proteins; and 4). Variants for neurological disorders are found at conserved sites of EBF3 that disrupt DNA binding. The application of our tools highlights the utility of translating data from projects such as ENCODE into biophysical mechanisms for protein evolution and disease mechanisms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call