Abstract
The ability to accurately predict the DNA targets and interacting cofactors of transcriptional regulators from genome-wide data can significantly advance our understanding of gene regulatory networks. NKX2-5 is a homeodomain transcription factor that sits high in the cardiac gene regulatory network and is essential for normal heart development. We previously identified genomic targets for NKX2-5 in mouse HL-1 atrial cardiomyocytes using DNA-adenine methyltransferase identification (DamID). Here, we apply machine learning algorithms and propose a knowledge-based feature selection method for predicting NKX2-5 protein : protein interactions based on motif grammar in genome-wide DNA-binding data. We assessed model performance using leave-one-out cross-validation and a completely independent DamID experiment performed with replicates. In addition to identifying previously described NKX2-5-interacting proteins, including GATA, HAND and TBX family members, a number of novel interactors were identified, with direct protein : protein interactions between NKX2-5 and retinoid X receptor (RXR), paired-related homeobox (PRRX) and Ikaros zinc fingers (IKZF) validated using the yeast two-hybrid assay. We also found that the interaction of RXRα with NKX2-5 mutations found in congenital heart disease (Q187H, R189G and R190H) was altered. These findings highlight an intuitive approach to accessing protein–protein interaction information of transcription factors in DNA-binding experiments.
Highlights
Complex gene regulatory networks (GRNs) guide development and tissue homeostasis in all organisms
We sought to determine if NKX2-5 targets could be classified based on the motif grammar embedded within their peaks, relative to a random peak set generated from sequences represented on the Affymetrix promoter microarray chip used for DNA-adenine methyltransferase identification (DamID) experiments [16]
Using a knowledge-based machine-learning approach, we identified and validated a number of novel NKX2-5 protein interactors, retinoid X receptor a (RXRa), paired-related homeobox 2 (PRRX2) and IKZF1/LYF-1, and their paralogues PRRX1a, PRRX1b, and IKZF3 and IKZF5
Summary
Complex gene regulatory networks (GRNs) guide development and tissue homeostasis in all organisms. Machine-learning algorithms have been applied to genome-wide datasets to make novel predictions related to cardiac GRN function. These studies have focused on predicting muscle-specific enhancers from validated training sets [8,9] or identifying known and novel TFs governing heart precursor and organ development based on sequence-level discriminators (motif grammar) [10,11]. While such studies have demonstrated the power of machine-learning approaches
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.