Protein trafficking or protein sorting is the mechanism by which a cell transports proteins to the appropriate position in the cell or outside of it. This targeting is based on the information contained in the protein. Many methods predict the subcellular location of proteins in eukaryotes from the sequence information. However, most of these methods use a flat structure to perform prediction. In this work, we introduce ensemble methods to predict locations in the eukaryotic protein-sorting non membrane pathway hierarchically. We used features that were extracted exclusively from full length protein sequences with feature subset selection for classification. Sequence driven features, sequence mapped features and sequence autocorrelation features were tested with ensemble learners and classifier performances were compared with and without feature subset selection technique. This study shows the new features extracted from full length eukaryotic protein sequences are effective at capturing biological features among compartments in eukaryotic non membrane pathways at two levels. Feature subset selection techniques helped to reduce the time taken for building the classification model.
Read full abstract