Abstract
BackgroundPrediction of transcriptional regulatory mechanisms in Arabidopsis has become increasingly critical with the explosion of genomic data now available for both gene expression and gene sequence composition. We have shown in previous work [1], that a combination of correlation measurements and cis-regulatory element (CRE) detection methods are effective in predicting targets for candidate transcription factors for specific case studies which were validated. However, to date there has been no quantitative assessment as to which correlation measures or CRE detection methods used alone or in combination are most effective in predicting TF→target relationships on a genome-wide scale.ResultsWe tested several widely used methods, based on correlation (Pearson and Spearman Rank correlation) and cis-regulatory element (CRE) detection (≥1 CRE or CRE over-representation), to determine which of these methods individually or in combination is the most effective by various measures for making regulatory predictions. To predict the regulatory targets of a transcription factor (TF) of interest, we applied these methods to microarray expression data for genes that were regulated over treatment and control conditions in wild type (WT) plants. Because the chosen data sets included identical experimental conditions used on TF over-expressor or T-DNA knockout plants, we were able to test the TF→target predictions made using microarray data from WT plants, with microarray data from mutant/transgenic plants. For each method, or combination of methods, we computed sensitivity, specificity, positive and negative predictive value and the F-measure of balance between sensitivity and positive predictive value (precision). This analysis revealed that the ≥1 CRE and Spearman correlation (used alone or in combination) were the most balanced CRE detection and correlation methods, respectively with regard to their power to accurately predict regulatory-target interactions.ConclusionThese findings provide an approach and guidance for researchers interested in predicting transcriptional regulatory mechanisms using microarray data that they generate (or microarray data that is publically available) combined with CRE detection in promoter sequence data.
Highlights
Prediction of transcriptional regulatory mechanisms in Arabidopsis has become increasingly critical with the explosion of genomic data available for both gene expression and gene sequence composition
Recent work in many eukaryotic species has focused on a Systems Biology approach, using multiple associations between genes, to elucidate regulatory networks and to understand their biological context [5,6]. These associations can be used in combination with gene expression data from microarray experiments and promoter sequence analysis of co-regulated genes, to infer the mechanism for this co-regulation and to search for cisregulatory elements (CREs) that may coordinate this response through transcription factor (TF) activity
Despite the widespread use of correlation methods in the past, we know very little about the performance of these methods in making TF→target predictions, especially using microarray and sequence data
Summary
Prediction of transcriptional regulatory mechanisms in Arabidopsis has become increasingly critical with the explosion of genomic data available for both gene expression and gene sequence composition. Recent work in many eukaryotic species has focused on a Systems Biology approach, using multiple associations between genes, to elucidate regulatory networks and to understand their biological context [5,6] These associations can be used in combination with gene expression data from microarray experiments and promoter sequence analysis of co-regulated genes, to infer the mechanism for this co-regulation and to search for cisregulatory elements (CREs) that may coordinate this response through transcription factor (TF) activity. The set of co-regulated genes can be used to identify candidate TF→target relationships using pair-wise associations between TFs and targets based on correlation over microarray data and/or putative CRE detection This methodology takes advantage of the current data on CRE binding sites for transcription factors as well as current annotation for transcription factors in Arabidopsis available in databases such as AGRIS [10]. Previous studies from our group, have shown that analyzing the co-regulation of genes across various experimental conditions in combination with CRE analysis of predicted target gene promoters has been effective in predicting new targets for transcription factors which were experimentally validated [1,11]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.