Definition of the binding specificity of the T7 bacteriophage primase by analysis of a protein binding microarray using a thermodynamic model.

Georg Lipps

doi:10.1093/nar/gkae215

Abstract

Protein binding microarrays (PBM), SELEX, RNAcompeteand chromatin-immunoprecipitation have been intensively used to determine the specificity of nucleic acid binding proteins. While the specificity of proteins with pronounced sequence specificity is straightforward, the determination of the sequence specificity of proteins of modest sequence specificity is more difficult. In this work, an explorative data analysis workflow for nucleic acid binding data was developed that can be used by scientists that want to analyse their binding data. The workflow is based on a regressor realized in scikit-learn, the major machine learning module for the scripting language Python. The regressor is built on a thermodynamic model of nucleic acid binding and describes the sequence specificity with base- and position-specific energies. The regressor was used to determine the binding specificity of the T7 primase. For this, we reanalysed the binding data of the T7 primase obtained with a custom PBM. The binding specificity of the T7 primase agrees with the priming specificity (5'-GTC) and the template (5'-GGGTC) for the preferentially synthesized tetraribonucleotide primer (5'-pppACCC) but is more relaxed. The dominant contribution of two positions in the motif can be explained by the involvement of the initiating and elongating nucleotides for template binding.

Full Text