Abstract INTRODUCTION: Metagenomic sequencing of human microbiome and other complex microbial communities reveals extensive heterogeneity on sub-species levels. Detecting these heterogeneities (and strains) in the human microbiome is particularly important because different strains of the same bacterial species often display distinct phenotypes, such as pathogenicity in humans and antimicrobial resistance. For the context of cancer, studies show that the host microbiome may slow or contribute to the development and progression of cancer as well as the patient response to certain cancer therapies. However, reconstruction of heterogeneous bacterial species (represented by multiple strains) from a single metagenomic sample is challenging. Here we present an algorithm for phasing and assembly of closely related strains from long reads, which could also be applied to phased somatic variants in cancer. METHODS: The goal of the Strainy phasing procedure is to cluster long reads based on their strain (or haplotype) of origin. Given a set of all reads aligned to a unitig, Strainy builds a connection graph, where nodes correspond to aligned reads, and edges connect reads that share the same SNP genotypes. Then, Strainy iteratively uses a community detection algorithm to partition the connection graph into densely connected components that are assembled into strain unitigs. Finally, Strainy uses the overlap graph approach to extend strain unitigs and integrate them back to the original de novo assembly graph. RESULTS: We benchmark Strainy against Hifiasm-meta, Strainberry, metaFlye and metaMDBG using several simulated, mock and real datasets with ONT and PacBio HiFi reads.On simulated and mock metagenomic datasets, Strainy phased and reconstructed a substantially higher portion of unique strain sequence compared to the other tools, while having fewer errors. We then applied Strainy to untangle strains in metagenomic sequencing of activated sludge from an anaerobic digester bacterial community previously sequenced with Nanopre R9, R10 and PacBio HiFi. The number of recovered strains per bacterial species varied from 1 to 4. On average, 8.6 heterozygous structural variants per bacterial species were recovered. Non-synonymous to synonymous substitutions rates (dN/dS) revealed a few hotspots with evidence of selection that were specific to different species. Strainy represents a first practical algorithmic framework for multi-allelic phasing with long reads. Tumor genomes often contain multiple clones defined by characteristic somatic variants. Phasing of somatic variants into clonal haplotypes is therefore a promising approach to better characterize clonal heterogeneity in cancer. We tested Strainy for multi-allelic phasing on a mix of multiple human genomes and successfully phased distinct haplotypes. Motivated by the promising results, we are working on extending our approach to cancer clone phasing. Citation Format: Ataberk Donmez, Ekaterina Kazantseva, Maria Frolova, Mihai Pop, Mikhail Kolmogorov. Strainy: Multi-allelic phasing and assembly of bacterial strains and tumor clones using long reads [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 6299.
Read full abstract- All Solutions
Editage
One platform for all researcher needs
Paperpal
AI-powered academic writing assistant
R Discovery
Your #1 AI companion for literature search
Mind the Graph
AI tool for graphics, illustrations, and artwork
Unlock unlimited use of all AI tools with the Editage Plus membership.
Explore Editage Plus - Support
Overview
5099 Articles
Published in last 50 years
Articles published on Somatic Variants
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
4649 Search results
Sort by Recency