Comprehensive analysis of beta-lactamase genes in clinical strains of Escherichia coli and Klebsiella pneumoniae: molecular characterization, and in Silico predictions.
The emergence of beta-lactamase producing multidrug-resistant (MDR) gram-negative bacteria presents a significant challenge to effective treatment of infections. This study focuses on the isolation, amplification, and molecular characterization of β-lactamase genes from clinical strains of Escherichia coli and Klebsiella pneumoniae. Seven new partial gene sequences, including novel variants of blaOXA and blaNDM, were identified after screening 108 clinical samples and submitted to NCBI GenBank. In silico analysis revealed considerable diversity and distribution of these resistance genes among different strains of bacteria. Gene structure predictions using GENSCAN showed that blaOXA genes typically contain single exons with moderate GC content, whereas blaNDM genes feature longer exons with higher GC content. Multiple sequence alignment showed that NDM and OXA β-lactamases were highly similar, with only slight differences in a few amino acids. The study also analyzed the physico-chemical properties, functional domains, and phosphorylation patterns of the β-lactamase proteins. Secondary structure prediction indicated a dominance of beta sheets, contributing to protein stability, while tertiary modeling provided insights into their 3D structure. Overall, these findings provide critical insights into the genetic diversity and potential mechanisms of β-lactamase-mediated resistance, offering valuable information for the development of novel therapeutic strategies and surveillance programs.
- Research Article
82
- 10.1002/(sici)1097-0134(199606)25:2<169::aid-prot3>3.0.co;2-d
- Jun 1, 1996
- Proteins: Structure, Function, and Bioinformatics
The success rates reported for secondary structural class prediction with different methods are contradictory. On one side, the problem of recognizing the secondary structural class of a protein knowing only its amino acid composition appears completely solved by simply applying jury decision with an elliptically scaled distance function. Chou and coworkers repeatedly (see Crit. Rev. Biochem. Mol. Biol. 30:275-349, 1995) published prediction accuracies near 100%. On the other hand, traditional secondary structure prediction techniques achieve success rates of about 70% for the secondary structural state per residue and about 75% for structural class only with extensive input information (full sequence of the query protein, its amino acid composition and length, multiple alignments with homologous sequences). In this article, we resolve the paradox and consider (1) the question of the secondary structural class definition, (2) the role of the representativity of the test set of protein tertiary structure for the current state of the Protein Data Bank (PDB); and (3) we estimate the real impact of amino acid composition on secondary structural class. We formulate three objective criteria for a reasonable definition of secondary structural classes and show that only the criterion of Nakashima et al. (J. Biochem. 99:153-162, 1986) complies with all of them. Only this definition matches the distribution of secondary structural content in representative PDB subsets, whereas other criteria leave many proteins (up to 65% of all PDB entries) simply unassigned. We review critically specialized secondary-structural class prediction methods, especially those of Chou and coworkers, which claim almost 100% accuracy using only amino acid composition, and resolve the paradox that these prediction accuracies are better than those from secondary structure predictions from multiple alignments. We show (i) that these techniques rely on a preselection of test sets which removes irregular proteins and other proteins without any class assignment (about 35% of all PDB entries); and (ii) that even for preselected representative test sets, the success rate drops to 60% and lower for a 4-type classification (alpha, beta, alpha + beta, alpha/beta). The prediction accuracies fall to about 50% if the secondary structural class definition of Nakashima et al. is applied and only few irregular proteins are preselected and removed from automatically generated, representative subsets of the PDB. We have applied two new vector decomposition methods for secondary structural content prediction from amino acid composition alone, with and without consideration of amino acid compositional coupling in the learning set of tertiary structures respectively, to the problem of class prediction and achieve about 60% correct assignment among four classes (alpha, beta, mixed, irregular) as well as single sequence-based secondary structure prediction methods like GORIII and COMBI. Our results demonstrate that 60% correctness is the upper limit for a 4-type class prediction from amino acid composition alone for an unknown query protein and that consideration of compositional coupling does not improve the prediction success. The prediction program SSCP offering secondary structural class assignment for query compositions and sequences has been made available as a World Wide Web and E-mail service.
- Research Article
- 10.36347/sjet.2024.v12i07.005
- Jul 25, 2024
- Scholars Journal of Engineering and Technology
Objective: To analyze the structure and properties of Klebsiella pneumoniae and its encoded proteins. Methods: To analyze and predict the gene of Klebsiella pneumoniae and to analyze and predict the sequence and structure of its encoded proteins by using various information analysis tools in NCBI, ExPASy and other websites. The analysis of the gene sequence (homology analysis, multiple sequence comparison, conserved region analysis, gene structure prediction, gene annotation, enzyme cleavage site analysis, primer design, six-frame translation, etc.), protein sequence analysis and structure prediction (primary structure analysis, subcellular localization, signal peptide, transmembrane information, secondary structure prediction, three-dimensional structural homology modeling, etc.), molecular phylogenetic analysis (constructing phylogenetic tree), and molecular phylogenetic analysis (constructing phylogenetic tree). Conclusion: We successfully analyzed and predicted the sequence and structure of Klebsiella pneumoniae and its encoded proteins, with a view to providing reference for the in-depth study of the biological properties of SapC, the ABC transporter permease of Klebsiella pneumoniae, the establishment of a rapid detection method for the bacterium, and the selection of targets for subunit and nucleic acid vaccines, and laying a foundation for the further understanding and utilization of this gene.
- Research Article
11
- 10.3389/fbioe.2022.901018
- Jul 22, 2022
- Frontiers in bioengineering and biotechnology
Prediction of the protein secondary structure is a key issue in protein science. Protein secondary structure prediction (PSSP) aims to construct a function that can map the amino acid sequence into the secondary structure so that the protein secondary structure can be obtained according to the amino acid sequence. Driven by deep learning, the prediction accuracy of the protein secondary structure has been greatly improved in recent years. To explore a new technique of PSSP, this study introduces the concept of an adversarial game into the prediction of the secondary structure, and a conditional generative adversarial network (GAN)-based prediction model is proposed. We introduce a new multiscale convolution module and an improved channel attention (ICA) module into the generator to generate the secondary structure, and then a discriminator is designed to conflict with the generator to learn the complicated features of proteins. Then, we propose a PSSP method based on the proposed multiscale convolution module and ICA module. The experimental results indicate that the conditional GAN-based protein secondary structure prediction (CGAN-PSSP) model is workable and worthy of further study because of the strong feature-learning ability of adversarial learning.
- Abstract
- 10.1016/j.bpj.2017.11.2393
- Feb 1, 2018
- Biophysical Journal
Combining Prediction of Protein Aggregation Propensities with Prediction of Other One-Dimensional Properties
- Research Article
2
- 10.2174/18750362-v16-e230711-2023-2
- Jul 26, 2023
- The Open Bioinformatics Journal
The Mycobacterium tuberculosis complex (MTBC) bacteria include the slowly growing, host-associated bacteria Mycobacterium tuberculosis, Mycobacterium Bovis, Mycobacterium microti, Mycobacterium africanum, Mycobacterium pinnipedii.
- Research Article
117
- 10.1186/1471-2105-8-201
- Jun 14, 2007
- BMC Bioinformatics
BackgroundStructural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio.ResultsHere we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available.ConclusionThe predictive system are publicly available at the address .
- Book Chapter
27
- 10.1007/978-3-319-12883-2_19
- Nov 30, 2014
Correct prediction of secondary and tertiary structure of proteins is one of the major challenges in bioinformatics/computational biological research. Predicting the correct secondary structure is the key to predict a good/satisfactory tertiary structure of the protein which not only helps in prediction of protein function but also in prediction of sub-cellular localization. This chapter aims to explain the different algorithms and methodologies, which are used in secondary structure prediction. Similarly, tertiary structure prediction has also emerged as one of developing areas of bioinformatics/computational biological research owing to the large gap between the available number of protein sequences and the known experimentally solved structures. Because of time and cost intensive experimental methods, experimentally determined structures are not available for vast majority of the available protein sequences present in public domain databases. The primary aim of this chapter is to offer a detailed conceptual insight to the algorithms used for protein secondary and tertiary structure prediction. This chapter systematically illustrates flowchart for selecting the most accurate prediction algorithm among different categories for the target sequence against three categories of tertiary structure prediction methods. Out of the three methods, homology modeling which is considered as most reliable method is discussed in detail followed by strengths and limitations for each of these categories. This chapter also explains different practical and conceptual problems, obstructing the high accuracy of the protein structure in each of the steps for all the three methods of tertiary structure prediction. The popular hybrid methodologies which further club together a number of features such as structural alignments, solvent accessibility and secondary structure information are also discussed. Moreover, this chapter elucidates about the Meta-servers that generate consensus result from many servers to build a protein model of high accuracy. Lastly, scope for further research in order to bridge existing gaps and for developing better secondary and tertiary structure prediction algorithms is also highlighted.
- Research Article
- 10.1007/s0089490050078
- Apr 1, 1999
- Journal of Molecular Modeling
Three different strategies to tackle mispredictions from incorrect secondary structure prediction are analysed using 21 small proteins (22-121 amino acids; 1-6 secondary structure elements) with known three dimensional structures: (1) Testing accuracy of different secondary structure predictions and improving them by combinations, (2) correcting mispredictions exploiting protein folding simulations with a genetic algorithm and (3) applying and combining experimental data to refine predictions both for secondary structure and tertiary fold. We demonstrate that predictions from secondary structure prediction programs can be efficiently combined to reduce prediction errors from missed secondary structure elements. Further, up to two secondary structure elements (helices, strands) missed by secondary structure prediction were corrected by the genetic algorithm simulation. Finally, we show how input from experimental data is exploited to refine the predictions obtained.
- Research Article
6
- 10.1186/1471-2105-6-261
- Oct 21, 2005
- BMC Bioinformatics
BackgroundAccurate and automatic gene finding and structural prediction is a common problem in bioinformatics, and applications need to be capable of handling non-canonical splice sites, micro-exons and partial gene structure predictions that span across several genomic clones.ResultsWe present a mRNA/DNA homology based gene structure prediction tool, GIGOgene. We use a new affine gap penalty splice-enhanced global alignment algorithm running in linear memory for a high quality annotation of splice sites. Our tool includes a novel algorithm to assemble partial gene structure predictions using interval graphs. GIGOgene exhibited a sensitivity of 99.08% and a specificity of 99.98% on the Genie learning set, and demonstrated a higher quality of gene structural prediction when compared to Sim4, est2genome, Spidey, Galahad and BLAT, including when genes contained micro-exons and non-canonical splice sites. GIGOgene showed an acceptable loss of prediction quality when confronted with a noisy Genie learning set simulating ESTs.ConclusionGIGOgene shows a higher quality of gene structure prediction for mRNA/DNA spliced alignment when compared to other available tools.
- Research Article
17
- 10.1002/prot.22646
- Dec 10, 2009
- Proteins: Structure, Function, and Bioinformatics
The accurate prediction of protein structure, both secondary and tertiary, is an ongoing problem. Over the years, many approaches have been implemented and assessed. Most prediction algorithms start with the entire amino acid sequence and treat all residues in an identical fashion independent of sequence position. Here, we analyze blind prediction data to investigate whether predictive capability varies along the chain. Free modeling results from recent critical assessment of techniques for protein structure prediction (CASP) experiments are evaluated; as is the most up-to-date data from EVA, a fully automated blind test of secondary structure prediction servers. The results demonstrate that structure prediction accuracy is dependent on sequence position. Both secondary structure and tertiary structure predictions are more accurate in regions near the amino(N)-terminus when compared with analogous regions near the carboxy(C)-terminus. Eight of 10 secondary structure prediction algorithms assessed by EVA perform significantly better in regions at the N-terminus. CASP data shows a similar bias, with N-terminal fragments being predicted more accurately than fragments from the C-terminus. Two analogous fragments are taken from each model, the N-terminal fragment begins at the start of the most N-terminal secondary structure element (SSE), whereas the C-terminal fragment finishes at the end of the most C-terminal SSE. Each fragment is locally superimposed onto its respective native fragment. The relative terminal prediction accuracy (RMSD) is calculated on an intramodel basis. At a fragment length of 20 residues, the N-terminal fragment is predicted with greater accuracy in 79% of cases.
- Conference Article
1
- 10.1109/bibm.2014.6999394
- Nov 1, 2014
Determining the structure of ribosomal RNAs (rRNAs) is one of the crucial steps in understanding the process of protein synthesis, for which rRNAs are one of the basic components. Nevertheless, due to extreme technical difficulties, spatial (3D) structures have been resolved experimentally for only 14 organisms. Also, computational prediction of 3D rRNA structure is almost impossible, and prediction of secondary structure (the list of base pairs in the folded RNA), an important intermediate step between sequence and 3D structure that is used broadly in modeling of RNA structures, is in the case of rRNAs hindered by both extreme sequence length and high structure complexity. Here we present a proof-of-concept for an rRNA secondary structure prediction method that utilizes known structures as structural templates. Our template-based prediction algorithm determines those regions of the sequence for which structure is being predicted that are conserved well enough so that their secondary structure can be copied over from the template. The structure of the remaining, unconserved regions is predicted using a thermodynamic folding model. Applying a baseline implementation of our algorithm to the E. coli 16S rRNA, we have achieved state-of-the-art recall and precision using the structure of T. thermophilus 16S rRNA as a template.
- Research Article
4
- 10.1016/j.bbapap.2008.01.020
- Feb 20, 2008
- BBA - Proteins and Proteomics
Fourier transform coupled to tryptophan-scanning mutagenesis: Lessons from its application to the prediction of secondary structure in the acetylcholine receptor lipid-exposed transmembrane domains
- Research Article
90
- 10.1002/pro.5560040713
- Jul 1, 1995
- Protein Science
This work provides a systematic comparison of vibrational CD (VCD) and electronic CD (ECD) methods for spectral prediction of secondary structure. The VCD and ECD data are simplified to a small set of spectral parameters using the principal component method of factor analysis (PC/FA). Regression fits of these parameters are made to the X-ray-determined fractional components (FC) of secondary structure. Predictive capability is determined by computing structures for proteins sequentially left out of the regression. All possible combinations of PC/FA spectral parameters (coefficients) were used to form a full set of restricted multiple regressions with the FC values, both independently for each spectral data set as well as for the two VCD sets and all the data grouped together. The complete search over all possible combinations of spectral parameters for different types of spectral data is a new feature of this study, and the focus on prediction is the strength of this approach. The PC/FA method was found to be stable in detail to expansion of the training set. Coupling amide II to amide I' parameters reduced the standard deviations of the VCD regression relationships, and combining VCD and ECD data led to the best fits. Prediction results had a minimum error when dependent on relatively few spectral coefficients. Such a limited dependence on spectral variation is the key finding of this work, which has ramifications for previous studies as well as suggests future directions for spectral analysis of structure. The best ECD prediction for helix and sheet uses only one parameter, the coefficient of the first subspectrum. With VCD, the best predictions sample coefficients of both the amide I' and II bands, but error is optimized using only a few coefficients. In this respect, ECD is more accurate than VCD for alpha-helix, and the combined VCD (amide I' + II) predicts the beta-sheet component better than does ECD. Combining VCD and ECD data sets yields exceptionally good predictions by utilizing the strengths of each. However, the residual error, its distribution, and, most importantly, the lack of dependence of the method on many of the significant components derived from the spectra leads to the conclusion that the heterogeneity of protein structure is a fundamental limitation to the use of such spectral analysis methods. The underutilization of these data for prediction of secondary structure suggests spectral data could predict a more detailed descriptor.
- Book Chapter
6
- 10.1093/oso/9780199634972.003.0004
- Nov 28, 1996
The prediction of protein secondary structure is a major part of the general protein folding problem and is the most general method of obtaining some structural information from any newly-determined sequence. Secondary structure prediction is useful in a wide variety of problems concerned with proteins.
- Research Article
126
- 10.1017/s1355838202554066
- Jun 1, 2002
- RNA
The existence and functional importance of RNA secondary structure in the replication of positive-stranded RNA viruses is increasingly recognized. We applied several computational methods to detect RNA secondary structure in the coding region of hepatitis C virus (HCV), including thermodynamic prediction, calculation of free energy on folding, and a newly developed method to scan sequences for covariant sites and associated secondary structures using a parsimony-based algorithm. Each of the prediction methods provided evidence for complex RNA folding in the core- and NS5B-encoding regions of the genome. The positioning of covariant sites and associated predicted stem-loop structures coincided with thermodynamic predictions of RNA base pairing, and localized precisely in parts of the genome with marked suppression of variability at synonymous sites. Combined, there was evidence for a total of six evolutionarily conserved stem-loop structures in the NS5B-encoding region and two in the core gene. The virus most closely related to HCV, GB virus-B (GBV-B) also showed evidence for similar internal base pairing in its coding region, although predictions of secondary structures were limited by the absence of comparative sequence data for this virus. While the role(s) of stem-loops in the coding region of HCV and GBV-B are currently unknown, the structure predictions in this study could provide the starting point for functional investigations using recently developed self-replicating clones of HCV.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.