Abstract

BackgroundAs a marker of Helicobacter pylori, Cytotoxin-associated gene A (cagA) has been revealed to be the major virulence factor causing gastroduodenal diseases. However, the molecular mechanisms that underlie the development of different gastroduodenal diseases caused by cagA-positive H. pylori infection remain unknown. Current studies are limited to the evaluation of the correlation between diseases and the number of Glu-Pro-Ile-Tyr-Ala (EPIYA) motifs in the CagA strain. To further understand the relationship between CagA sequence and its virulence to gastric cancer, we proposed a systematic entropy-based approach to identify the cancer-related residues in the intervening regions of CagA and employed a supervised machine learning method for cancer and non-cancer cases classification.MethodologyAn entropy-based calculation was used to detect key residues of CagA intervening sequences as the gastric cancer biomarker. For each residue, both combinatorial entropy and background entropy were calculated, and the entropy difference was used as the criterion for feature residue selection. The feature values were then fed into Support Vector Machines (SVM) with the Radial Basis Function (RBF) kernel, and two parameters were tuned to obtain the optimal F value by using grid search. Two other popular sequence classification methods, the BLAST and HMMER, were also applied to the same data for comparison.ConclusionOur method achieved 76% and 71% classification accuracy for Western and East Asian subtypes, respectively, which performed significantly better than BLAST and HMMER. This research indicates that small variations of amino acids in those important residues might lead to the virulence variance of CagA strains resulting in different gastroduodenal diseases. This study provides not only a useful tool to predict the correlation between the novel CagA strain and diseases, but also a general new framework for detecting biological sequence biomarkers in population studies.

Highlights

  • Helicobacter pylori (H. pylori) is a Gram-negative helix-shaped bacterium inhabiting the human stomach and infecting more than half of the world’s population [1,2,3]

  • This research indicates that small variations of amino acids in those important residues might lead to the virulence variance of CagA strains resulting in different gastroduodenal diseases

  • This study provides a useful tool to predict the correlation between the novel CagA strain and diseases, and a general new framework for detecting biological sequence biomarkers in population studies

Read more

Summary

Introduction

Helicobacter pylori (H. pylori) is a Gram-negative helix-shaped bacterium inhabiting the human stomach and infecting more than half of the world’s population [1,2,3]. Recent studies have shown that it is associated with gastroduodenal diseases, including duodenal ulcers [4], gastric ulcers [5] and chronic gastritis It is a significant risk factor for developing gastric cancer [6,7,8]. At the same time through activating mitogen-activated protein kinase (MAPK), extracellular signal-regulated kinase (ERK) [17] and focal adhesion kinase (FAK), CagA can cause cell dissociation and infiltrative tumor growth [18,19,20,21] Such a process makes CagA a most important virulence factor in H. pylori [22]. As a marker of Helicobacter pylori, Cytotoxin-associated gene A (cagA) has been revealed to be the major virulence factor causing gastroduodenal diseases. To further understand the relationship between CagA sequence and its virulence to gastric cancer, we proposed a systematic entropy-based approach to identify the cancer-related residues in the intervening regions of CagA and employed a supervised machine learning method for cancer and non-cancer cases classification

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call