Understanding protein subcellular localization is vital and indispensable in proteomics research. Molecular biology and computer science developments have enabled the use of computational approaches to identify proteins in cells. An excellent method for locating proteins is confocal microscopy, used by the Human Protein Atlas (HPA). It can help researchers better understand human pathology and help physicians automate medical imaging interpretation by classifying human proteins. Human protein Atlas comprises millions of images annotated with single or multiple labels. However, only a few approaches are developed for automatic prediction of protein localization, and they focus mainly on single-label classification. Therefore, a recognition system for multi-label classification of HPA with acceptable performance should be developed. Hence, this study aims to develop a deep learning-based system for the multi-label classification of HPA. Specifically, two architectures have been proposed in this work for automatically extracting features from the images and predicting the localization of the proteins in 28 subcellular compartments. First, a convolutional neural network has been proposed, which has been trained from scratch and second an ensemble-based model using transfer learning architectures has been proposed. The results shows that both the models perform well in protein localization classification tasks for major cellular organelles. Yet, in this study, the proposed convolutional network outperforms the ensemble model in classification of images with multiple simultaneous protein localizations. The models were evaluated on 3 performance criteria: recall, precision and f1-score. The proposed convolutional neural network beats the ensemble model by achieving recall of 0.75, precision of 0.75 and f1-score of 0.74.
Read full abstract