Related Topics
Articles published on Genome Assembly
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
17052 Search results
Sort by Recency
- New
- Research Article
- 10.1080/23802359.2026.2635843
- Apr 3, 2026
- Mitochondrial DNA Part B
- Hye Been Kim + 3 more
Solanum carolinense Linnaeus, belonging to the family Solanaceae, is a perennial herb or subshrub. S. carolinense has become naturalized in Korea as an invasive species, forming a stable population that has grown naturally with native plants for more than 10 years. However, its chloroplast genome structure and complete sequence have not yet been reported. Therefore, we determined the complete chloroplast genome sequence of S. carolinense using genome sequencing, assembly, and annotation. The total length of the chloroplast genome was 155,315 bp with a GC content of 37.6%. It featured a quadripartite structure (a large single-copy region, 86,160 bp; a small single-copy region, 18,459 bp; and two inverted repeat regions, 25,348 bp each). It contains 129 genes, including 84 coding sequences (CDSs), 37 tRNA genes, 8 rRNA genes, and one pseudogene. Phylogenetic analysis of 78 CDSs revealed that S. carolinense is closely related to S. aridum Morong and S. hieronymi Kuntze. These results provide a molecular foundation for phylogenetic and evolutionary studies of the genus Solanum and present a fundamental chloroplast genomic resource for future invasion biology research.
- New
- Research Article
- 10.1016/j.ymeth.2026.02.001
- Apr 1, 2026
- Methods (San Diego, Calif.)
- Junyao Li + 7 more
DNA-regulated structural engineering of metal nanomaterials: A strategy for advanced optical biosensing.
- New
- Research Article
- 10.1016/j.dib.2025.112423
- Apr 1, 2026
- Data in brief
- Yuki Matsumoto + 3 more
Sika deer (Cervus nippon) is naturally distributed across East Asia and includes 14 subspecies, showing phenotypic and genetic diversity. In this study, we constructed a de novo genome assembly of wild sika deer using one of the largest subspecies, C. n. yesoensis. We used HiFi, high quality long-read based on Pacific Bioscience to assemble our novel genome assembly CerNipYes1.0. The genome size of CerNipYes1.0 is estimated to be 3.1Gb, which is 0.6Gb larger than the other genome assembly of sika deer previously reported. The number of scaffolds is 1,810 and N50 length achieved 77Mb. Compleasm, a genome completeness evaluation tool based on Benchmarking Universal Single-Copy Orthologs (BUSCO) indicated that 12,562 (99.75%) genes are completed as genes with comparing to database. Our results indicate that CerNipYes1.0 is valuable to study the molecular biology, phylogeny and evolution of the Cervidae and its genome.
- New
- Research Article
- 10.1016/j.dib.2026.112454
- Apr 1, 2026
- Data in brief
- Bidyut R Mohapatra + 2 more
This study reports the whole-genome sequence data and functional annotations of a novel Stutzerimonas marianensis strain LB-0542 isolated from the decomposing pelagic Sargassum biomass stranded on Long Beach, Barbados. The genomic DNA was sequenced with the Illumina NextSeq2000 platform. The genome assembly was performed with the SPAdes Genome Assembler (ver 3.15.5). The assembled genome has a size of 4520,813 bp, a coverage of 110X, a GC content of 63.2 %, a L50 of 2 and a N50 of 1079,143 bp. The genome consists of 12 contigs, 0 CRISPR, 3 rRNA, 56 tRNA and 4166 CDSs (coding sequences) with a coding ratio of 89.4 %. The genome annotation results for the COG (cluster of orthologous genes) and subsystem features indicate that the metabolism and the amino acids and derivatives are the most dominant categories, respectively. The analysis of the genome for the existence of Carbohydrate-Active Enzymes (CAZymes) identified 230 genes encoding four functional classes of CAZymes [glycoside hydrolases (75 genes), glycosyltransferases (95 genes), carbohydrate esterases (9 genes) and carbohydrate-binding modules (51 genes)]. The functional annotation of the genome for plastic degradation revealed the presence of 34 genes, which could catalyse the degradation process of 14 types of plastics, polyethylene glycol [PEG (29 %)], polylactic acid [PLA (11 %)], poly(3-hydroxybutyrate-co-3-hydroxyvalerate) [PHBV (9 %)], polyhydroxyalkanoates [PHA (9 %)], polyethylene [PE (6 %)], polycaprolactone [PCL (6 %)], polyethersulfone [PES (6 %)], polyethylene terephthalate [PET (6 %)], poly(butylene adipate-co-terephthalate [PBAT (3 %)], (polystyrene [PS (3 %)], polybutylene succinate [PBSA (3 %)], poly(3-hydroxyvalerate) [P3HV (3 %)], polyvinyl alcohol [PVA (3 %)] and natural rubber [NR (3 %)]. The genome mining for plant growth-promoting traits identified 3175 genes that are associated with the colonizing plant system (26 %), competitive exclusion (21 %), stress control (21 %), biofertilization (14 %), phytohormone and plant signal production (10 %), bioremediation (7 %) and plant immune response stimulation (1 %). These genome mining results are an indication of the biotechnological and ecological significance of the novel strain LB-0542 for sustainable biocatalytic processing of Sargassum and plastic-containing waste. The genome sequence data is available in DDBJ/EMBL/GenBank with the accession number BAAIAE000000000.
- New
- Research Article
- 10.1016/j.ijppaw.2025.101183
- Apr 1, 2026
- International journal for parasitology. Parasites and wildlife
- Xianghe Wang + 7 more
Complete mitochondrial genome pathological characteristics and scanning electron microscopic observations of Armillifer moniliformis isolated from Manis javanica.
- New
- Research Article
- 10.1016/j.pbi.2026.102859
- Apr 1, 2026
- Current opinion in plant biology
- Todd P Michael
Plant genome biology is entering a new era defined by fully phased, chromosome-scale, telomere-to-telomere assemblies, enabled by the convergence of long-read sequencing technologies, improved assembly algorithms, and powerful scaffolding strategies. Gapless, haplotype-resolved genomes are now feasible even for polyploid species, shifting the bottleneck from assembly to annotation and interpretation. Genome annotation remains one of the greatest opportunities and challenges in plant biology. While ab initio methods still form the backbone of structural prediction, evidence-based frameworks that integrate RNA sequencing, chromatin accessibility, methylation, and 3D genome data are rapidly advancing the field. At the same time, artificial intelligence-driven protein-coding gene predictors are redefining ab initio gene finding, and large-scale orthology networks continue to improve functional inference. The next frontier is extending annotation beyond protein-coding genes into regulatory and structural dimensions, a goal increasingly enabled by single-cell and multi-omic technologies. Looking forward, the integration of AI, multi-omics, and large language models promises to standardize and automate workflows from DNA isolation to functional annotation. These innovations will accelerate fundamental plant biology discovery, enable next-generation biodiversity conservation, and transform strategies for crop improvement and biotechnology.
- New
- Research Article
- 10.1016/j.dib.2026.112543
- Apr 1, 2026
- Data in brief
- Kankana Roy + 1 more
We present a draft genome dataset for Xylaria sp. (KR-3U) isolated as an endophyte from Catharanthus roseus leaves in India. Whole genome sequencing was performed using Illumina NovaSeq 6000 platform, generating 35.2 million paired-end raw reads (150 bp), providing ∼120× coverage (∼5.32 Gb of raw data) for a 44.24 Mb assembly (960 contigs >1 kb, GC content of 47.76%, and an N50 of 101,126 bp). Read remapping showed 96.07% alignment to the assembly. BUSCO (fungi_odb10) analysis indicated 97.0% completeness. Gene prediction using AUGUSTUS identified 11,916protein coding genes. BLASTp searches against the Swiss-Prot database yielded significant hits for 7299 proteins, of which 7204 were mapped to Gene Ontology (GO) terms and 5869 sequences received functional annotations . Integration of InterProScan-supported annotations resulted in 5645 proteins assigned at least one GO term. KEGG KAAS assigned 4,144genes (3,391KO numbers) to diverse pathways. Carbohydrate active enzyme (CAZyme) analysis revealed 556 CAZyme encoding genes, with 39.74% (221 genes) predicted to be secreted. antiSMASH detected 111 biosynthetic gene clusters (BGCs), including polyketide synthases (PKS), non-ribosomal peptide synthetases (NRPS), terpenes and hybrid clusters. The complete genome sequence and raw reads have been deposited in National Center for Biotechnology Information (NCBI) under the GenBank accession number JBSEFG000000000, BioProject number PRJNA1335662, BioSample ID SAMN52018968, and SRA (raw reads) accession number SRR35731853 The genome assembly (FASTA), gene annotation (GFF3), and secondary genome analysis files are deposited in Mendeley Data (Version 3; DOI: 10.17632/b8jn5rtwkg.3) under a CC BY 4.0 license to ensure transparency, reproducibility, and reuse of the analyses.
- New
- Research Article
- 10.1016/j.meegid.2026.105903
- Apr 1, 2026
- Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases
- Muyideen Kolapo Tijani + 7 more
Human babesiosis is an emerging disease as more cases are being reported worldwide. Most cases in Europe are caused by Babesia divergens, whereas most cases in North America are due to Babesia microti. While B. microti is also found throughout Europe, it appears to be less pathogenic. We generated high-quality nuclear and organellar genome assemblies of two B. divergens and two B. microti isolates from Germany by long-read sequencing and compared them with each other and with the reference RI B. microti from the US. Variants of VESA1 (variant erythrocyte surface antigen 1) and secreted antigen 1/3 dominated the genes of the B. divergens isolates. The B. microti isolates and RI B. microti lacked VESA1. However, the B. microti isolates were different by their lack of 37 other proteins found in RI B. microti. These RI-specific proteins were intracellular, secreted, and membrane-bound. The lack of one or more of these genes by the new B. microti isolates may be the reason for why US strains are more pathogenic than B. microti from Europe. The ability of B. divergens to evade the immune system using VESA1, perhaps in combination with some secreted antigens, may be responsible for its higher pathogenicity compared with the European B. microti strains. This study has improved our understanding of the pathogenesis of babesiosis, and the new genomic data provided here has increased the repertoire of available genomic information about Babesia, especially since our new B. microti genomes are the first from Europe.
- New
- Research Article
- 10.1016/j.ympev.2025.108522
- Apr 1, 2026
- Molecular phylogenetics and evolution
- Michael J Buontempo + 11 more
Evolutionary history of Ridge-nosed Rattlesnakes (Crotalus willardi): A specialized and diverse montane species.
- Research Article
- 10.1186/s12870-026-08519-5
- Mar 14, 2026
- BMC plant biology
- Jiangtao Wang + 10 more
Caprifoliaceae is a cosmopolitan plant family encompassing a large number of species, with significant economic and ecological values. Species divergence and adaptation in Caprifoliaceae are not well understood due to extensive hybridization and rapid radiation. Here, we assembled and compared the complete mitochondrial genomes of Kolkwitzia amabilis and Triplostegia glandulifera, which are distributed in distinct habitats with unique life forms, to elucidate the structural variation of mitogenomes and their implications for species divergence and environmental adaptation. The mitochondrial genome of K. amabilis consists of two circular molecules with sizes of 540,375bp and 197,940bp, respectively, while T. glandulifera has a single circular genome of 642,933bp. The two genomes exhibit a similar preference for A/U bases in codon usage, mainly driven by natural selection, but present significant differences in mode and the number of repeat types. Among 25 shared genes in six Caprifoliaceae species, atp1 had the highest nucleotide diversity; meanwhile, atp4, ccmB, and mttB were under positive selection. Notably, rps7 in T. glandulifera had potentially undergone species-specific positive selection. A phylogenetic tree of Caprifoliaceae species based on 19 conserved protein-coding genes (PCGs) showed that K. amabilis and T. glandulifera formed a sister group, providing molecular evidence at the mitogenome level for the systematic classification of plants in this family. Furthermore, analyses of ancestral genome reconstruction indicated that both mitogenomes exhibited significant gene order rearrangements and gene deletions, and their duplication patterns of nad1 and nad2 genes differed from those of other Caprifoliaceae species, suggesting that dynamic structural changes were an important feature of mitogenome evolution in Caprifoliaceae. In conclusion, the firstly reported mitogenomes of two distinctive species provided important molecular data for phylogenetic, genomic evolution, potential divergence and adaption in Caprifoliaceae.
- Research Article
- 10.1038/s41597-026-07043-3
- Mar 13, 2026
- Scientific data
- Lijia Chen + 3 more
The Leschenault's rousette (Rousettus leschenaultii) represents a medium-sized bat of the genus Rousettus that feeds on fruits or flowers, mainly distributed in Southeast Asia. Here, we generated a chromosome-level genome assembly of R. leschenaultii using a hybrid MGI and PacBio sequencing approach, facilitated by the chromosome assignment using the high-throughput chromatin conformation capture (Hi-C) sequencing technology. The genome size was 1.95 Gb with a scaffold N50 of 116.99 Mb, and 99.00% of the assembled sequences were anchored to 17 autosomes and the two sex chromosomes (X and Y). The completeness of the assembly was estimated to be 96.4% using BUSCO. In total, 19,625 genes were predicted from this genome assembly, with 97.83% of them being functionally annotated. This high-quality assembly of R. leschenaultii serves as a valuable genetic resource for exploring the genomic basis of evolutionary adaptation, and for conducting population, ecological, and conservation genomic studies.
- Research Article
- 10.1002/advs.202513287
- Mar 13, 2026
- Advanced science (Weinheim, Baden-Wurttemberg, Germany)
- Kainan Li + 9 more
The fungal genus Diaporthe poses a significant threat to global food security by causing devastating crop diseases, including soybean seed decay and stem blight caused by D. longicolla. However, the molecular basis of its pathogenicity and the evolutionary mechanisms underlying its virulence remain poorly understood. Here, we present complete telomere-to-telomere genome assemblies of four Diaporthe species, revealing extensive chromosomal rearrangements correlating with phylogenetic divergence. Comparative analyses of 34 Diaporthe genomes identified secondary metabolism genes as the most variable fraction. Comprehensive genome exploration across fungi has revealed that Diaporthe harbors the largest repertoire of secondary metabolite biosynthetic gene clusters (SMBGCs) reported to date. We demonstrate that frequent chromosomal rearrangements and rapid intra-cluster gene variation are key drivers of SMBGC diversification, thereby accelerating the evolution of these gene clusters. Interestingly, we identified horizontal gene transfer events that further expanded the metabolic potential of these clusters. Functional characterization of the five rapidly evolving SMBGCs identified demonstrated their direct role in mediating pathogenicity, underscoring the biological significance of their rapid diversification. Collectively, this study establishes chromosomal plasticity as a crucial mechanism for ecological adaptation and secondary metabolite arsenal expansion in plant pathogens, providing new insights into the evolution of fungal virulence.
- Research Article
- 10.1093/plphys/kiag133
- Mar 13, 2026
- Plant physiology
- Daozong Chen + 13 more
Wallflower (Erysimum cheiri) belongs to the monogeneric Erysimeae tribe of the mustard family (Brassicaceae). It is widely cultivated as an ornamental garden plant and appreciated for its diverse flower colors. However, the absence of a high-quality genome has hampered research on wallflower genome evolution and the mechanisms underlying variations in flower color. Here, we assembled a nearly gap-free telomere-to-telomere genome of E. cheiri. The assembled genome enabled the reconstruction of genome evolution in the genus Erysimum (274 species), tracing the changes from the ancestral n = 8 genome (in E. cheiranthoides) to the derived genomes with seven (in E. nevadense) and six (in E. cheiri) chromosome pairs. While the reduction from n = 8 to n = 7 was mediated by a nested chromosome fusion accompanied by inversions, the further decrease to n = 6 in E. cheiri resulted from an end-to-end translocation involving the other two non-homologous chromosomes. Compared with other Brassicaceae species, E. cheiri showed a notable expansion of gene families related to secondary metabolite biosynthesis. Its flower color variation was primarily determined by the biosynthesis and accumulation of carotenoids and flavonoids. We mapped the metabolic pathways for carotenoids and flavonoids, identifying the hub genes regulating their biosynthesis. This research lays an important foundation for understanding the chromosomal and genome evolution of the Erysimeae tribe and paves the way for future investigations into genetic studies and breeding applications of E. cheiri.
- Research Article
- 10.1038/s41597-026-07037-1
- Mar 12, 2026
- Scientific data
- Kuo Gao + 5 more
Gymnodiptychus pachycheilus, a specialized schizothoracinae fish, exhibits remarkable morphological regression, such as near-complete scale loss, making it an important model for understanding the genomic basis of adaptation and phylogeny in this lineage. Here, we assembled a high-quality chromosome-level reference genome of G. pachycheilus by Illumina short reads, PacBio HiFi long reads, and Hi-C technologies. The final genome assembly spans 1.83 Gb, with 95.52% of the sequences anchored onto 25 pseudochromosomes, a contig N50 of 71.36 Mb, and a BUSCO completeness of 98.40%. Repetitive sequences accounted for 47.75% of this genome. We predicted 48,952 protein-coding genes, of which 92.51% were functionally annotated using multiple public databases, together with diverse classes of non-coding RNAs. This genome assembly represents the first high-quality reference for G. pachycheilus and provides a valuable resource for exploring genome evolution, rediploidization processes, and adaptive mechanisms of schizothoracinae fishes on the Qinghai-Tibet Plateau.
- Research Article
- 10.1016/j.celrep.2026.117074
- Mar 12, 2026
- Cell reports
- Griffin D Haas + 12 more
De novo recovery of Ghana virus, an African bat Henipavirus, reveals differential tropism and attenuated pathogenicity compared to Nipah virus.
- Research Article
- 10.1007/s00122-026-05179-9
- Mar 12, 2026
- TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik
- Yuk Woon Cheung + 11 more
The complexity of potato genetics, characterised by tetrasomic inheritance, has contributed to slower genetic gain in potato compared to other major crops. Disease resistance genes, often found in large clusters of highly similar paralogs and alleles, further complicate genetic studies. The H1 resistance locus, introgressed into potato cultivars from Solanum tuberosum ssp. andigena, has been successfully used for over 60years to control Globodera rostochiensis in Europe. Although previous genetic studies mapped this resistance to chromosome 5, the complete structure of the locus remained elusive. To reduce genomic complexity, we generated a dihaploid of the cultivar 'Athlete', DH4_Athlete, carrying the H1 resistance locus, and produced a phased haplotype representation of the H1 interval using Oxford Nanopore sequencing. Combined with RenSeq-based association genetics, this approach allowed us to reconstruct the entire H1 locus, including recombination points at both the 5' and 3' ends of the interval.
- Research Article
- 10.1038/s41597-026-07039-z
- Mar 11, 2026
- Scientific data
- Yuxia Yang + 5 more
Ichthyurus bourgeoisi Gestro is a representative species of the tribe Ichthyurini within the beetle family Cantharidae. This tribe is particularly noteworthy because of its brachelytrous characteristics. However, the lack of high-quality genomic resources hinders our understanding of the evolution and ecological adaptions associated with this beetle group. In this study, we present a chromosome-level genome assembly for I. bourgeoisi constructed using a combination of PacBio HiFi and Hi-C sequencing data. The genome spans 664.72 Mb, with a scaffold N50 of 98.12 Mb, and is organized into seven pseudo-chromosomes, including a chromosome X validated through analyses of genome collinearity and sequencing depth. Repeat sequences account for 65.35% of the genome, and 13,386 protein-coding genes are identified. The high-quality genome assembly and annotation has been corroborated by multiple metrics, including genome size, reads mapping rate, and BUSCO completeness (98.6%). This comprehensive genomic resource provides a foundation for elucidating the ecological adaption of I. bourgeoisi and advancing our understanding of morphological evolution in Ichthyurini within Cantharidae.
- Research Article
- 10.1186/s13059-026-04028-8
- Mar 11, 2026
- Genome biology
- Penglong Wan + 12 more
Centromeres are chromosomal loci epigenetically specified by the histone variant CENH3, where kinetochores assemble to ensure accurate chromosome segregation during cell division. Their repetitive and rapidly evolving DNA has long impeded large-scale characterization. Advances in long-read sequencing now enable complete genome assemblies across species and within populations, providing opportunities to investigate how centromeres evolve and diversify over timescales from thousands to millions of years. Here, we generate near-telomere-to-telomere genome assemblies for eggplant, African eggplant, and wild pepper. Using CENH3 ChIP-seq, we delineate functional centromeric chromatin in these assemblies and in the cultivated pepper 'CA59', tomato 'Heinz 1706', and a wild tomato accession. These genomes harbor satellite-free centromeres across all chromosomes except chromosome 3 in tomato and its wild progenitor. Instead, centromeres are primarily composed of Ty3/Gypsy LTR retrotransposons, whose clade composition, abundance, recent activity, and spatial distribution differ among species. Centromere size scales with genome size in Solanaceae crops. Comparisons of closely related genomes reveal frequent centromere positional shifts driven by pericentromeric inversions and centromere repositioning. Synteny decays more rapidly around centromeres, consistent with elevated breakage within CENH3-binding regions. Finally, centromere haplotypes vary within species, exemplified by multiple haplotypes on four African eggplant chromosomes. These findings highlight the remarkable evolutionary dynamics and within-species variation of centromeres in Solanaceae crops, revealing distinct species-specific organizational patterns. This study positions Solanaceae as a promising model for comparative analyses of plant centromere evolution and provides a foundation for future research exploring how centromere variation contributes to phenotypic diversity.
- Research Article
- 10.1038/s41467-026-70421-3
- Mar 10, 2026
- Nature communications
- Ruben Millan-Solsona + 11 more
Atomic force microscopy (AFM) is a widely used tool for nanoscale characterization across materials science, energy research, and biology. However, its adoption in high-throughput materials discovery and statistically driven studies remains limited by a strong dependence on expert operator input and by the scarcity of annotated experimental AFM datasets needed to enable data-driven automation. Here, we introduce SimuScan, a synthetic-data-driven framework that enables reliable AFM feature identification, segmentation, and targeted imaging without requiring large manually labeled experimental datasets. SimuScan generates tunable, high-fidelity synthetic AFM images of defined morphologies while incorporating realistic experimental artifacts, including tip-sample convolution, noise, flattening distortions, and surface debris. These datasets are shown to support scalable, label-free training of modern deep learning models for AFM analysis. When integrated into data-driven AFM workflows, SimuScan-trained models can locate and analyze nanoscale structures across large datasets and guide targeted follow-up imaging. We validate this approach on nanostructured surfaces, DNA assemblies, and bacterial cells, demonstrating robust generalization across diverse sample types with minimal operator intervention. More broadly, this work establishes a general strategy for generating explicitly conditioned, task-relevant synthetic data to improve the reliability of downstream models in autonomous microscopy.
- Research Article
- 10.1038/s41597-026-07013-9
- Mar 10, 2026
- Scientific data
- Meixin Yang + 5 more
Wheat Snow Mold (WSM), caused by Microdochium species, poses a serious threat to global wheat production. Despite its importance, the genetic and molecular mechanisms of Microdochium remain poorly understood. In particular, genome-wide differences between M. majus and M. nivale, which were previously considered a single species, have not been fully elucidated. Here, we present the first high-quality telomere-to-telomere genome assemblies of M. majus (231095) and M. nivale (231047), based on Nanopore and Illumina sequencing, with genome sizes of 36.50 Mb and 37.27 Mb. Each assembly was anchored to 13 chromosomes and one circular mitochondrial genome. We identified 11,432 and 11,904 protein-coding genes, with BUSCO completeness scores of 98.5% and 99.3%; of these, 11,094 and 11,504 genes were functionally annotated. Comparative genomics revealed a high degree of collinearity between the two strains, along with segment relocations and gene presence/absence variations. This study enhances our understanding of the genetic foundations of M. majus and M. nivale, laying the groundwork for future research on genetic evolution and disease management.