Predicting peptides that can bind to MHC class I molecules is an important step in the vaccine design process. Computational approaches have potential to provide good predictive models that save both time and cost of the process. Position Specific Scoring Matrix (PSSM) is a reliable approach when dealing with amino acid sequences. PSSM formation involves carefully selecting its constructing data and parameters. In this work, we apply three different data splitting strategies and propose alternative values for the embedded PSSM parameters. The basic principle of data splitting is to choose train data that is able to represent the whole data. We propose using the Kennard–Stone algorithm to highlight the importance of choosing the data constituting the PSSM. Furthermore, this work proposes modifications to PSSM parameters and studies the model behavior in response to each change. The model is applied to experimental data for the Major Histocompatibility Complex of class I. Performance of modified parameters show either comparable or better results to conventional parameters. Moreover, Kennard–Stone data splitting algorithm contributed to significant model performance enhancement.
Read full abstract