Abstract

The recent upsurge in microbial genome data has revealed that hemoglobin-like (HbL) proteins may be widely distributed among bacteria and that some organisms may carry more than one HbL encoding gene. However, the discovery of HbL proteins has been limited to a small number of bacteria only. This study describes the prediction of HbL proteins and their domain classification using a machine learning approach. Support vector machine (SVM) models were developed for predicting HbL proteins based upon amino acid composition (AC), dipeptide composition (DC), hybrid method (AC + DC), and position specific scoring matrix (PSSM). In addition, we introduce for the first time a new prediction method based on max to min amino acid residue (MM) profiles. The average accuracy, standard deviation (SD), false positive rate (FPR), confusion matrix, and receiver operating characteristic (ROC) were analyzed. We also compared the performance of our proposed models in homology detection databases. The performance of the different approaches was estimated using fivefold cross-validation techniques. Prediction accuracy was further investigated through confusion matrix and ROC curve analysis. All experimental results indicate that the proposed BacHbpred can be a perspective predictor for determination of HbL related proteins. BacHbpred, a web tool, has been developed for HbL prediction.

Highlights

  • Hemoglobin, the oxygen carrying protein first discovered in humans, was thought to be present exclusively in eukaryotes, but this old paradigm changed when a Hb-like (HbL) protein was discovered in the bacterium Vitreoscilla [1]

  • We have developed a series of Support vector machine (SVM) modules to predict HbL proteins with high accuracy

  • SVM modules have been developed for the prediction of HbL proteins using amino acid composition (AC) and dipeptide composition (DC), position specific scoring matrix (PSSM), and min amino acid residue (MM) profiles and hybrid approach (AC + DC)

Read more

Summary

Introduction

Hemoglobin, the oxygen carrying protein first discovered in humans, was thought to be present exclusively in eukaryotes, but this old paradigm changed when a Hb-like (HbL) protein was discovered in the bacterium Vitreoscilla [1]. HbL proteins found in bacteria display large variations in their amino acid sequences and structural organization. The basic architecture of the globin fold and amino acid residues needed for maintaining a common structural organization are conserved throughout the globin family. Three distinct structural organizations have been observed in bacterial hemoglobin: single domain HbL proteins exhibiting a classical globin-like fold, truncated HbL proteins displaying truncation in their helical structure, and chimeric HbL proteins where the globin domain is integrated with other domains having different functions [3]. The chimeric HbL proteins (flavohemoglobin) have been further classified into three groups: (1) globin domain with only distant similarity to the FAD-domain (FAD— insignificant according to Pfam), (2) flavohemoglobin proteins containing additional cytochrome reductase domain at their C-terminus and a FAD/NAD-binding FR-type domain, and (3) globin with FAD/NAD-binding FR-type domain

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call