Abstract

Biological prediction of transcription factor binding sites and their corresponding transcription factor target genes (TFTGs) makes great contribution to understanding the gene regulatory networks. However, these approaches are based on laborious and time-consuming biological experiments. Numerous computational approaches have shown great potential to circumvent laborious biological methods. However, the majority of these algorithms provide limited performances and fail to consider the structural property of the datasets. We proposed a refined systematic computational approach for predicting TFTGs. Based on previous work done on identifying auxin response factor target genes from Arabidopsis thaliana co-expression data, we adopted a novel reverse-complementary distance-sensitive n-gram profile algorithm. This algorithm converts each upstream sub-sequence into a high-dimensional vector data point and transforms the prediction task into a classification problem using support vector machine-based classifier. Our approach showed significant improvement compared to other computational methods based on the area under curve value of the receiver operating characteristic curve using 10-fold cross validation. In addition, in the light of the highly skewed structure of the dataset, we also evaluated other metrics and their associated curves, such as precision-recall curves and cost curves, which provided highly satisfactory results.

Highlights

  • Unraveling the gene regulatory networks is regarded as one of the fundamental problems challenging biologists [1]

  • Gene expression is systematically controlled by regulatory proteins known as transcription factors (TFs) that bind to specific cognate DNA sites known as transcription factor binding sites (TFBSs)

  • reverse-complementary distance-sensitive n-gram profiling (RCDSNGP), 2 455 252 unique features were generated from 1000-bp upstream of 2787 sequences (186 transcription factor target genes (TFTGs) +2601 nonTFTGs)

Read more

Summary

Introduction

Unraveling the gene regulatory networks is regarded as one of the fundamental problems challenging biologists [1]. Gene expression is systematically controlled by regulatory proteins known as transcription factors (TFs) that bind to specific cognate DNA sites known as transcription factor binding sites (TFBSs). Through interacting with other cis-elements, these TFs can function as repressors preventing transcription by inhibiting the activity of RNA polymerization complex either by directly binding to TFBSs or indirectly modifying transcription factor target genes (TFTGs). Transcription factors can function as activators, which promote the expression of TFTGs. In addition to posttranscriptional gene regulations, there are post-translational gene modification regulations, including biochemical alteration and RNA interference [2,3]. The interplay among the corresponding TFs, TFBSs, and TFTGs remains the predominant mechanism governing the gene regulatory processes

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call