International Dataset Research Articles

Antimicrobial resistance (AMR) in Escherichia coli is a global problem associated with substantial morbidity and mortality. AMR-associated genes are typically annotated based on similarity to variants in a curated reference database, with the implicit assumption that uncatalogued genetic variation within these is phenotypically unimportant. In this study, we evaluated the performance of the AMRFinder tool and, subsequently, the potential for discovering new AMR-associated gene families and characterising variation within existing ones to improve genotype-to-susceptibility phenotype predictions in Ecoli. In this cross-sectional study of international genome sequence data, we assembled a global dataset of 9001 Ecoli sequences from five publicly available data collections predominantly deriving from human bloodstream infections from: Norway, Oxfordshire (UK), Thailand, the UK, and Sweden. 8555of these sequences had linked antibiotic susceptibility data. Raw reads were assembled using Shovill and AMR genes (relevant to amoxicillin-clavulanic acid, ampicillin, ceftriaxone, ciprofloxacin, gentamicin, piperacillin-tazobactam, and trimethoprim) extracted using the National Center for Biotechnology Information AMRFinder tool (using both default and strict [100%] coverage and identity filters). We assessed the predictive value of the presence of these genes for predicting resistance or susceptibility against US Food and Drug Administration thresholds for major and very major errors. Mash was used to calculate the similarity between extracted genes using Jaccard distances. We empirically reclustered extracted gene sequences into AMR-associated gene families (≥70% match) and antibiotic-resistance genes (ARGs; 100% match) and categorised these according to their frequency in the dataset. Accumulation curves were simulated and correlations between gene frequency in the Oxfordshire and other datasets calculated using the Spearman coefficient. Firth regression was used to model the association between the presence of blaTEM-1 variants and amoxicillin-clavulanic acid or piperacillin-tazobactam resistance, adjusted for the presence of other relevant ARGs. The performance of the AMRFinder database for genotype-to-phenotype predictions using strict 100% identity and coverage thresholds did not meet US Food and Drug Administration thresholds for any of the seven antibiotics evaluated. Relaxing filters to default settings improved sensitivity with a specificity cost. For all antibiotics, most explainable resistance was associated with the presence of a small number of genes. There was a proportion of resistance that could not be explained by known ARGs; this ranged from 75·1% for amoxicillin-clavulanic acid to 3·4% for ciprofloxacin. Only 18 199 (51·5%) of the 35 343ARGs detected had a 100% identity and coverage match in the AMRFinder database. After empirically reclassifying genes at 100% nucleotide sequence identity, we identified 1042unique ARGs, of which 126 (12·1%) were present ten times or more, 313 (30·0%) were present between two and nine times, and 603 (57·9%) were present only once. Simulated accumulation curves revealed that discovery of new (100% match) ARGs present more than once in the dataset plateaued relatively quickly, whereas new singleton ARGs were discovered even after many thousands of isolates had been included. We identified a strong correlation (Spearman coefficient 0·76 [95% CI 0·73-0·80], p<0·0001) between the number of times an ARG was observed in Oxfordshire and the number of times it was seen internationally, with ARGs that were observed six times in Oxfordshire always being found elsewhere. Finally, using the example of blaTEM-1, we showed that uncatalogued variation, including synonymous variation, is associated with potentially important phenotypic differences; for example, two common, uncatalogued blaTEM-1 alleles with only synonymous mutations compared with the known reference were associated with reduced resistance to amoxicillin-clavulanic acid (adjusted odds ratio 0·58 [95% CI 0·35-0·95], p=0·031) and piperacillin-tazobactam (0·50 [95% CI 0·29-0·82], p=0·005). We highlight substantial uncatalogued genetic variation with respect to known ARGs, although a relatively small proportion of these alleles are repeatedly observed in a large international dataset suggesting strong selection pressures. The current approach of using fuzzy matching for ARG detection, ignoring the unknown effects of uncatalogued variation, is unlikely to be acceptable for future clinical deployment. The association of synonymous mutations with potentially important phenotypic differences suggests that relying solely on amino acid-based gene detection to predict resistance is unlikely to be sufficient. Finally, the inability to explain all resistance using existing knowledge highlights the importance of new target gene discovery. National Institute for Health and Care Research, Wellcome, and UK Medical Research Council.

This investigation focuses on refining software effort estimation (SEE) to enhance project outcomes amidst the rapid evolution of the software industry. Accurate estimation is a cornerstone of project success, crucial for avoiding budget overruns and minimizing the risk of project failures. The framework proposed in this article addresses three significant issues that are critical for accurate estimation: dealing with missing or inadequate data, selecting key features, and improving the software effort model. Our proposed framework incorporates three methods: the Novel Incomplete Value Imputation Model (NIVIM), a hybrid model using Correlation-based Feature Selection with a meta-heuristic algorithm (CFS-Meta), and the Heterogeneous Ensemble Model (HEM). The combined framework synergistically enhances the robustness and accuracy of SEE by effectively handling missing data, optimizing feature selection, and integrating diverse predictive models for superior performance across varying project scenarios. The framework significantly reduces imputation and feature selection overhead, while the ensemble approach optimizes model performance through dynamic weighting and meta-learning. This results in lower mean absolute error (MAE) and reduced computational complexity, making it more effective for diverse software datasets. NIVIM is engineered to address incomplete datasets prevalent in SEE. By integrating a synthetic data methodology through a Variational Auto-Encoder (VAE), the model incorporates both contextual relevance and intrinsic project features, significantly enhancing estimation precision. Comparative analyses reveal that NIVIM surpasses existing models such as VAE, GAIN, K-NN, and MICE, achieving statistically significant improvements across six benchmark datasets, with average RMSE improvements ranging from 11.05% to 17.72% and MAE improvements from 9.62% to 21.96%. Our proposed method, CFS-Meta, balances global optimization with local search techniques, substantially enhancing predictive capabilities. The proposed CFS-Meta model was compared to single and hybrid feature selection models to assess its efficiency, demonstrating up to a 25.61% reduction in MSE. Additionally, the proposed CFS-Meta achieves a 10% (MAE) improvement against the hybrid PSO-SA model, an 11.38% (MAE) improvement compared to the Hybrid ABC-SA model, and 12.42% and 12.703% (MAE) improvements compared to the hybrid Tabu-GA and hybrid ACO-COA models, respectively. Our third method proposes an ensemble effort estimation (EEE) model that amalgamates diverse standalone models through a Dynamic Weight Adjustment-stacked combination (DWSC) rule. Tested against international benchmarks and industry datasets, the HEM method has improved the standalone model by an average of 21.8% (Pred()) and the homogeneous ensemble model by 15% (Pred()). This comprehensive methodology underscores our model’s contributions to advancing software project management (SPM) through advanced predictive modeling, setting a new benchmark for software engineering effort estimation.

International Dataset Research Articles

Related Topics

Articles published on International Dataset

Characteristics of Technological Disasters.

Re-imagining educational quality. The need for a multidimensional approach in evaluating educational quality through TIMSS data

What are the key predictors of international teacher shortages?

Global air freight flow data for aviation policy modelling

Long-term natural history in type II and III spinal muscular atrophy: a 4-year international study on the Hammersmith Functional Motor Scale Expanded.

Toward Centrality Evaluation of Yearning Symptoms for Prolonged Grief Disorder: A Cross-Cultural Approach

Modelling crop management and environmental effects on the development of Leptosphaeria maculans pseudothecia

Exploring uncatalogued genetic variation in antimicrobial resistance gene families in Escherichia coli: an observational analysis

Framework to improve software effort estimation accuracy using novel ensemble rule

Multicenter cohort analysis of anoikis and EMT: implications for prognosis and therapy in lung adenocarcinoma

Beyond borders: The moderating role of cultural religiosity in the relationship between moral circle and generosity.

Human-in-the-Loop-A Deep Learning Strategy in Combination with a Patient-Specific Gaussian Mixture Model Leads to the Fast Characterization of Volumetric Ground-Glass Opacity and Consolidation in the Computed Tomography Scans of COVID-19 Patients.

Board Responsibility for Irresponsibility: The Link Between Board Structure and Corporate Scandals

Wait or pivot? Family and non-family firms’ strategic responses to COVID-19 and employment change

Does corporate sustainability performance matter for cash holdings? International evidence

An international, open-access dataset of dental wear patterns and associated broad age classes in archaeological cattle mandibles

Structural Brain Differences in the Alzheimer’s Disease Continuum: Insights Into the Heterogeneity From a Large Multisite Neuroimaging Consortium

Student-centered teaching across OECD countries: An ecological perspective

Follow the Leader: How Culture Gives Rise to a Behavioral Bias That Leads to Higher Greenhouse Gas Emissions

Is corporate community involvement associated with poverty and income inequality? International evidence

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

International Dataset Research Articles

Related Topics

Articles published on International Dataset

Characteristics of Technological Disasters.

Re-imagining educational quality. The need for a multidimensional approach in evaluating educational quality through TIMSS data

What are the key predictors of international teacher shortages?

Global air freight flow data for aviation policy modelling

Long-term natural history in type II and III spinal muscular atrophy: a 4-year international study on the Hammersmith Functional Motor Scale Expanded.

Toward Centrality Evaluation of Yearning Symptoms for Prolonged Grief Disorder: A Cross-Cultural Approach

Modelling crop management and environmental effects on the development of Leptosphaeria maculans pseudothecia

Exploring uncatalogued genetic variation in antimicrobial resistance gene families in Escherichia coli: an observational analysis

Framework to improve software effort estimation accuracy using novel ensemble rule

Multicenter cohort analysis of anoikis and EMT: implications for prognosis and therapy in lung adenocarcinoma

Beyond borders: The moderating role of cultural religiosity in the relationship between moral circle and generosity.

Human-in-the-Loop-A Deep Learning Strategy in Combination with a Patient-Specific Gaussian Mixture Model Leads to the Fast Characterization of Volumetric Ground-Glass Opacity and Consolidation in the Computed Tomography Scans of COVID-19 Patients.

Board Responsibility for Irresponsibility: The Link Between Board Structure and Corporate Scandals

Wait or pivot? Family and non-family firms’ strategic responses to COVID-19 and employment change

Does corporate sustainability performance matter for cash holdings? International evidence

An international, open-access dataset of dental wear patterns and associated broad age classes in archaeological cattle mandibles

Structural Brain Differences in the Alzheimer’s Disease Continuum: Insights Into the Heterogeneity From a Large Multisite Neuroimaging Consortium

Student-centered teaching across OECD countries: An ecological perspective

Follow the Leader: How Culture Gives Rise to a Behavioral Bias That Leads to Higher Greenhouse Gas Emissions

Is corporate community involvement associated with poverty and income inequality? International evidence