Abstract

The promoter region, positioned proximal to the transcription start sites, exerts control over the initiation of gene transcription by modulating the interaction with RNA polymerase. Consequently, the accurate recognition of promoter regions represents a critical focus within the bioinformatics domain. Although some methods leveraging pre-trained language models (PLMs) for promoter prediction have been proposed, the full potential of such PLMs remains largely untapped. In this study, we introduce PLPMpro, a model that capitalizes on prompt-learning and the pre-trained language model to enhance the prediction of promoter sequences. PLPMpro effectively harnesses the prompt learning paradigm to fully exploit the inherent capacities of the PLM, resulting in substantial improvements in prediction performance. Experiment results unequivocally demonstrate the efficacy of prompt learning in bolstering the capabilities of the pre-trained model. Consequently, PLPMpro surpasses both typical pre-trained model-based methods for promoter prediction and typical deep learning methods. Furthermore, we conduct various experiments to meticulously scrutinize the effects of different prompt learning settings and different numbers of soft modules on the model performance. More importantly, the interpretation experiment reveals that the pre-trained model captures biological semantics. Collectively, this research imparts a novel perspective on the optimal utilization of PLMs for addressing biological problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call