Abstract

Transcription factors (TFs) are sequence‐specific DNA‐binding proteins essential in regulating gene expression. Determining TF DNA‐binding specificity can help to study gene regulatory networks within cells and how genetic variation can disrupt normal gene expression. One method for characterizing TF specificity is through Support Vector Machines (SVMs) by analyzing chromatin immunoprecipitation followed by DNA‐sequencing (ChIP‐seq) data. However, this can also be achieved using Systematic Evolution of Ligands by Exponential Enrichment (SELEX) data, a method that also aids in determining TF‐DNA preferences. During this project, I implemented a gapped kmer SVM to study TF‐DNA binding preferences by using data from SELEX‐seq. I used a large scale‐gapped kmer, a sequence‐based SVM for analyzing TF specificity. It works by creating a predictive model that is trained with bound and unbound sequences from SELEX data. For purposes of this project, we used the T‐box transcription factor 5 (TBX5). After training the model for TBX5 and testing its performance, it had an AUROC value of 0.8248, indicating a significant degree of reliability. Likewise, the sequences with highest scores contained motifs for the TBX5. Given these results, we concluded that SVM was successfully implemented. In addition, SELEX data had not been previously used to train SVM based predictive models, meaning SELEX data is compatible and useful for developing predictive models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call