Quantitative analysis of visual codewords of a protein distance matrix.

Jure Pražnikar,Nuwan Tharanga Attygalle

doi:10.1371/journal.pone.0263566

Abstract

3D protein structures can be analyzed using a distance matrix calculated as the pairwise distance between all Cα atoms in the protein model. Although researchers have efficiently used distance matrices to classify proteins and find homologous proteins, much less work has been done on quantitative analysis of distance matrix features. Therefore, the distance matrix was analyzed as gray scale image using KAZE feature extractor algorithm with Bag of Visual Words model. In this study, each protein was represented as a histogram of visual codewords. The analysis showed that a very small number of codewords (~1%) have a high relative frequency (> 0.25) and that the majority of codewords have a relative frequency around 0.05. We have also shown that there is a relationship between the frequency of codewords and the position of the features in a distance matrix. The codewords that are more frequent are located closer to the main diagonal. Less frequent codewords, on the other hand, are located in the corners of the distance matrix, far from the main diagonal. Moreover, the analysis showed a correlation between the number of unique codewords and the 3D repeats in the protein structure. The solenoid and tandem repeats proteins have a significantly lower number of unique codewords than the globular proteins. Finally, the codeword histograms and Support Vector Machine (SVM) classifier were used to classify solenoid and globular proteins. The result showed that the SVM classifier fed with codeword histograms correctly classified 352 out of 354 proteins.

Highlights

The analysis of protein structures using the distance matrix of Cα atoms has a long history in structural biology
The protein distance matrix contains the distances between residues, which can be represented as a grayscale image, where the distances between pairs of Cα-atoms are represented by intensity
A less strong correlation is observed between domain size and the ratio of unique words in the repeat protein structures (R = -0.63), compared to the domains that are not part of the RepeatsDB database (R = -0.80), see S2 Fig. Overall, these results suggest that the ratio of unique words of solenoid and tandem repeat proteins is shifted towards lower ratios

Summary

Introduction

The analysis of protein structures using the distance matrix of Cα atoms has a long history in structural biology. A protein distance matrix has been used for structural alignment, protein classification, and finding homologous proteins [1, 2]. Tremendous progress has been made in predicting 3D proteins based on distance matrix and artificial intelligence [3,4,5,6]. Various studies have shown that the representation of protein structure in 2D space has the following main advantages: it represents local, short-, medium-, and long-range contacts between Cα-atoms simultaneously and is rotation and translation invariant [7]. The protein distance matrix contains the distances between residues, which can be represented as a grayscale image, where the distances between pairs of Cα-atoms are represented by intensity. Feature extraction can be applied to obtain points of interest

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Quantitative analysis of visual codewords of a protein distance matrix.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Journal: PloS one	Publication Date: Feb 4, 2022
License type: CC BY 4.0

Similar Papers

Quantitative analysis of visual codewords of a protein distance matrix
Nuwan Tharanga Attygalle ... Bostjan Kobe
-
Nuwan Tharanga Attygalle, et. al.Nuwan Tharanga Attygalle ... Bostjan Kobe
04 Feb 2022
04 Feb 2022

Weed/corn seedling recognition by support vector machine using texture features
...
African Journal of Agricultural Research | VOL. 4
, et. al. ...
30 Sep 2009
African Journal of Agricultural Research | VOL. 4

New bag of deep visual words based features to classify chest x-ray images for COVID-19 diagnosis.
Chiranjibi Sitaula ... Sunil Aryal
Health Information Science and Systems | VOL. 9
Chiranjibi Sitaula, et. al.Chiranjibi Sitaula ... Sunil Aryal
18 Jun 2021
Health Information Science and Systems | VOL. 9

Comparing HMAX and BoVW Models for Large-Scale Image Classification
Jalila Filali ... Jean Martinet
Procedia Computer Science | VOL. 192
Jalila Filali, et. al.Jalila Filali ... Jean Martinet
01 Jan 2020
Procedia Computer Science | VOL. 192

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Quantitative analysis of visual codewords of a protein distance matrix.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one