Heat shock proteins (HSPs) from different families and sub-types play a vital role in the folding and unfolding of proteins, in maintaining cellular health, and in preventing serious disorders. Previous computational methods for HSP classification have yielded promising performance. However, most of the existing methods rely heavily on amino acid composition features and still face challenges related to interpretability and accuracy. To overcome these issues, we introduce a novel frequent sequential pattern (FSP)-based analysis and classification method for the classification of HSPs, their families, and sub-types. The proposed method is called FSP4HSP, which stands for “FSP for HSP”. It identifies FSPs of amino acids (FSPAAs) and utilizes them for analysis and classification. Besides FSPAAs, sequential rules among amino acids are also discovered. Both binary and multi-class classification scenarios are considered, with the utilization of eight integer-based and four string-based classifiers. The incorporation of FSPAAs in the classification/prediction task enhances the interpretability of FSP4HSP and a comprehensive performance comparison using various evaluation measures demonstrates that it surpasses existing methods for the classification/recognition of HSPs.
Read full abstract