Abstract

In recent years, the dramatic increase in the number of applications for massively parallel reporter assay (MPRA) technology has produced a large body of data for various purposes. However, a computational model that can be applied to decipher regulatory codes for diverse MPRAs does not exist yet. Here, we propose a new computational method to predict the transcriptional activity of MPRAs, as well as luciferase reporter assays, based on the TRANScription FACtor database. We employed regression trees and multivariate adaptive regression splines to obtain these predictions and considered a feature redundancy-dependent formula for conventional regression trees to enable adaptation to diverse data. The developed method was applicable to various MPRAs despite the use of different types of transfected cells, sequence lengths, construct numbers and sequence types. We demonstrate that this method can predict the transcriptional activity of promoters in HEK293 cells through predictive functions that were estimated by independent assays in eight tumor cell lines. The prediction was generally good (Pearson's r = 0.68) which suggested that common active transcription factor binding sites across different cell types make greater contributions to transcriptional activity and that known promoter activity could confer transcriptional activity of unknown promoters in some instances, regardless of cell type.

Highlights

  • In metazoan cells, the processes of gene expression are regulated by various protein–protein, DNA–DNA interactions as well as protein–DNA interactions that involve transcription factors (TFs) binding to functional DNA segments that are pervasive in transcriptional initiation, elongation and termination

  • We propose a new computational method to predict the transcriptional activity of different massively parallel reporter assay (MPRA), as well as luciferase reporter assay via combined usage of the TRANScription FACtor database [21,22] (TRANSFAC) and the computational processes of regression trees and MARS

  • The method consists of four steps (Figure 1): [1] the data were pre-processed; [2] the TRANSFAC database was introduced and the sequences were characterized as transcription factor binding sites (TFBSs) enrichment scores and used as the explanatory variables, while the explanatory variables of different sequences were assembled into an explanatory variable matrix; [3] the PAGE 6 OF 13 explanatory variable matrix and the corresponding transcriptional activity were input into a feature redundancydependent sizing regression tree, which has a proposed feature redundancy-dependent formula to enable adaption to diverse data sets to construct clusters and [4] MARS was used to construct predictive functions for individual clusters estimated in the third step

Read more

Summary

Introduction

The processes of gene expression are regulated by various protein–protein, DNA–DNA interactions as well as protein–DNA interactions that involve transcription factors (TFs) binding to functional DNA segments that are pervasive in transcriptional initiation, elongation and termination. The dramatic increase in the number of applications of MPRA technology [2,3,8,9,10,11,12,13] produced a large body of data from cis-element reporter assays for different purposes, such as data for investigating genomic variants [10], distinguishing functions between promoters and enhancers [11], analyzing motifs or transcription factor binding sites (TFBSs) [8,13]. In MPRA, the transcriptional activities are generally identified by the ratios of barcode counts of mRNA to the template DNA

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.