Abstract

Human protein subcellular location prediction can provide critical knowledge for understanding a protein's function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification.

Highlights

  • During the past two decades, molecular, subcellular, cellular, and supercellular structures are visualized manually by biologists; the constantly updated techniques of automated microscopic imaging and biological tissue labeling have created revolutionary development opportunities for those of structure visualization [1]

  • Efforts for developing AI-protein subcellular location prediction (PSLP) can generally be summarized in the following four aspects: (1) benchmark dataset preparation, which means an organized collection of data for subsequent works; (2) image preprocessing level, which includes spatial transformation to bring images to a common reference frame and subsequent image normalization and object separation; (3) image feature level, which includes feature extraction and redundancy removing, and all approaches in this level aim to effectively quantizing the inherent property of original database and transforming the input data into a reduced representation set of features; (4) classification algorithm level, which focuses on training classifier model and paradigm design

  • Six major subcellular locations are concerned in this study, each training model is corresponding to six classifiers, and the outputs of each of six independent support vector machine (SVM) classifiers represent the confidence of a sample belonging to a specific label

Read more

Summary

Introduction

During the past two decades, molecular, subcellular, cellular, and supercellular structures are visualized manually by biologists; the constantly updated techniques of automated microscopic imaging and biological tissue labeling have created revolutionary development opportunities for those of structure visualization [1]. The first problem of current situation is big image data. The spatial distribution of target protein in a given cell type is critical to understanding protein function and how the cell behaves. It is always daunting for even a single cell type to acquire this spatial distribution information, because it is estimated that having a single image for every combination of cell type, The Scientific World Journal protein, and timescale would require the order of 100 billion images [2]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call