Integrating Pre-Trained protein language model and multiple window scanning deep learning networks for accurate identification of secondary active transporters in membrane proteins

Muhammad Shahid Malik,Yu-Yen Ou

doi:10.1016/j.ymeth.2023.10.008

Abstract

Secondary active transporters play pivotal roles in regulating ion and molecule transport across cell membranes, with implications in diseases like cancer. However, studying transporters via biochemical experiments poses challenges. We propose an effective computational approach to identify secondary active transporters from membrane protein sequences using pre-trained language models and deep learning neural networks.Our dataset comprised 290 secondary active transporters and 5,420 other membrane proteins from UniProt. Three types of features were extracted - one-hot encodings, position-specific scoring matrix profiles, and contextual embeddings from the ProtTrans language model. A multi-window convolutional neural network architecture scanned the ProtTrans embeddings using varying window sizes to capture multi-scale sequence patterns.The proposed model combining ProtTrans embeddings and multi-window convolutional neural networks achieved 86% sensitivity, 99% specificity and 98% overall accuracy in identifying secondary active transporters, outperforming conventional machine learning approaches.This work demonstrates the promise of integrating pre-trained language models like ProtTrans with multi-scale deep neural networks to effectively interpret transporter sequences for functional analysis. Our approach enables more accurate computational identification of secondary active transporters, advancing membrane protein research.

Full Text