Abstract

Transcription factors are key mediators of human complex disease processes. Identifying the target genes of transcription factors will increase our understanding of the biological network leading to disease risk. The prediction of transcription factor binding sites (TFBSs) is one method to identify these target genes; however, current prediction methods need improvement. We chose the transcription factor upstream stimulatory factor l (USF1) to evaluate the performance of our novel TFBS prediction method because of its known genetic association with coronary artery disease (CAD) and the recent availability of USF1 chromatin immunoprecipitation microarray (ChIP-chip) results. The specific goals of our study were to develop a novel and accurate genome-scale method for predicting USF1 binding sites and associated target genes to aid in the study of CAD. Previously published USF1 ChIP-chip data for 1 per cent of the genome were used to develop and evaluate several kernel logistic regression prediction models. A combination of genomic features (phylogenetic conservation, regulatory potential, presence of a CpG island and DNaseI hypersensitivity), as well as position weight matrix (PWM) scores, were used as variables for these models. Our most accurate predictor achieved an area under the receiver operator characteristic curve of 0.827 during cross-validation experiments, significantly outperforming standard PWM-based prediction methods. When applied to the whole human genome, we predicted 24,010 USF1 binding sites within 5 kilobases upstream of the transcription start site of 9,721 genes. These predictions included 16 of 20 genes with strong evidence of USF1 regulation. Finally, in the spirit of genomic convergence, we integrated independent experimental CAD data with these USF1 binding site prediction results to develop a prioritised set of candidate genes for future CAD studies. We have shown that our novel prediction method, which employs genomic features related to the presence of regulatory elements, enables more accurate and efficient prediction of USF1 binding sites. This method can be extended to other transcription factors identified in human disease studies to help further our understanding of the biology of complex disease.

Highlights

  • Several transcription factors (TFs) have been characterised as mediators of complex disease processes.[1,2,3] Numerous publications have identified single nucleotide polymorphisms (SNPs) in TFs that are significantly associated with coronary artery disease (CAD).[2,4,5] This combined evidence suggests that the target genes of these TFs may be associated with human complex disease

  • Prediction method development We assessed the merits of predicting upstream stimulatory factor 1 (USF1) –BSs using (1) DNA sequence alone; (2) sequence with single genomic features and (3) sequence with multiple genomic features to identify putative USF1 –BSs within the ENCODE regions

  • We started by using the position weight matrix (PWM) scoring method to identify potential USF1 transcription factor binding site (USF1–BS)

Read more

Summary

Introduction

Several transcription factors (TFs) have been characterised as mediators of complex disease processes.[1,2,3] Numerous publications have identified single nucleotide polymorphisms (SNPs) in TFs that are significantly associated with coronary artery disease (CAD).[2,4,5] This combined evidence suggests that the target genes of these TFs may be associated with human complex disease. Identification of potential TF targets could further our understanding of gene–gene interactions underlying complex disease. Genome-wide experimental methods, such as chromatin immunoprecipitation microarray (ChIP-chip),[6,7] a technique combining chromatin immunoprecipitation and microarray analysis for identifying TF-interacting genomic regions, are time consuming and expensive. It would be more efficient to develop an in silico computational method for TF target prediction followed by less costly genotyping and more focused molecular biology experiments to identify the association between gene– gene interactions and complex disease

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.