Abstract

DNA shape readout is an important mechanism of transcription factor target site recognition, in addition to the sequence readout. Several machine learning-based models of transcription factor–DNA interactions, considering DNA shape features, have been developed in recent years. Here, we present a new biophysical model of protein–DNA interactions by integrating the DNA shape properties. It is based on the neighbor dinucleotide dependency model BayesPI2, where new parameters are restricted to a subspace spanned by the dinucleotide form of DNA shape features. This allows a biophysical interpretation of the new parameters as a position-dependent preference towards specific DNA shape features. Using the new model, we explore the variation of DNA shape preferences in several transcription factors across various cancer cell lines and cellular conditions. The results reveal that there are DNA shape variations at FOXA1 (Forkhead Box Protein A1) binding sites in steroid-treated MCF7 cells. The new biophysical model is useful for elucidating the finer details of transcription factor–DNA interaction, as well as for predicting cancer mutation effects in the future.

Highlights

  • Understanding how transcription factors (TFs) recognize their target DNA binding sites is an important task in the study of gene regulation

  • There are models that aim at identifying proteins that may bind to DNA based on the protein amino acid sequences or models that focus on the prediction of protein target sites (e.g., BayesPI2 [3])

  • The main advantage of the new shape-restricted TF–DNA affinity model is the interpretability of the model parameters

Read more

Summary

Introduction

Understanding how transcription factors (TFs) recognize their target DNA binding sites is an important task in the study of gene regulation. There are models that aim at identifying proteins that may bind to DNA based on the protein amino acid sequences (e.g., nDNA-Prot [2]) or models that focus on the prediction of protein target sites (e.g., BayesPI2 [3]). The latter ones are useful in identifying functional TF binding sites, predicting the effects of mutations on gene regulation [4], and elucidating the differences between related TFs [5]. Experimental biases [14,15]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call