Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Export
Sort by: Relevance
  • New
  • Research Article
  • 10.1093/nargab/lqag007
SM3DD with segmented PCA: a comprehensive method for interpreting 3D spatial transcriptomics.
  • Mar 1, 2026
  • NAR genomics and bioinformatics
  • Tony Blick + 19 more

We developed Standardised Minimum 3D Distance (SM3DD), an entirely cell segmentation/annotation-free approach to the analysis of spatial RNA datasets, using it to compare lung tissue from 16 clinically normal individuals to that of 18 SARS-CoV-2 patients who died from acute respiratory distress syndrome. RNA spatial coordinates were determined using the CosMx™ Spatial Molecular Imager (Bruker Spatial Biology, US). For each individual transcript location, we calculated the three-dimensional distances to the nearest transcript of each transcript type, standardising the distances to each transcript type. Mean SM3DDs were compared between normal and SARS-CoV-2 patients. Notably, hierarchical clustering of the directional log10(P) values organized genes by functionality, making it easier to interpret biological contexts, and for FKBP11, where a decrease in distance to MZT2A was the most significant difference, suggesting a role in interferon signalling. Using a segmented principal components analysis of the entire SM3DD dataset, we identified multiple pathways, including 'SARS-CoV-2 infection', even though the assay did not include any SARS-CoV-2 transcripts.

  • New
  • Research Article
  • 10.1093/nargab/lqag006
NGSTroubleFinder: a tool for detection and quantification of contamination and kinship across human NGS data.
  • Mar 1, 2026
  • NAR genomics and bioinformatics
  • Samuel Valentini + 5 more

Quality control constitutes a critical component of any next-generation sequencing (NGS) pipeline; however, most existing pipelines emphasize technical quality assessment (e.g. read quality, alignment metrics, duplication rates) while overlooking other equally important dimensions, such as sample identity verification, contamination detection, kinship analysis, and metadata concordance. Detecting issues like cross-sample contamination and sample swaps is essential to control data integrity. Here, we present NGSTroubleFinder, a novel tool to detect cross-sample contamination in human whole-genome and whole-transcriptome sequencing data, sample swaps, and mismatches between the reported and the inferred genetic and transcriptomic sexes. It can be run directly on BAM/CRAM files without requiring additional variant-calling steps and offers an integrated pipeline for ensuring quality control on NGS data, generated particularly within the context of clinical studies or research projects involving family members. It produces a detailed report that combines the results of its multiple analyses, including kinship, sex prediction, and contamination metrics. The tool reports extensive information on the samples, both in textual and HTML formats, including key plots for easy interpretation of the results. NGSTroubleFinder is written in Python and incorporates a custom-built parallelized pileup engine written in C, and it can be easily installed with pip. The tool source code and the models are freely available on GitHub (https://github.com/STALICLA-RnD/NGSTroubleFinder), and a containerized version is available on Docker Hub (https://hub.docker.com/r/staliclarnd/ngstroublefinder).

  • New
  • Research Article
  • 10.1093/nargab/lqag014
ANOMALY: a Snakemake pipeline for identifying NuMTs from long-read sequencing data.
  • Mar 1, 2026
  • NAR genomics and bioinformatics
  • Nirmal S Mahar + 3 more

Nuclear mitochondrial DNA segments (NuMTs) can contribute to cancer development and disease progression by disrupting protein-coding genes. Furthermore, their presence confounds mitochondrial variant detection, underscoring the critical need for robust NuMT detection. Current methods to call NuMTs rely on short-read sequencing data but struggle to resolve complex NuMTs. These limitations can be overcome by employing long-read sequencing data. However, no such workflow exists to capture NuMTs from long-read sequencing data. Here, we introduce ANOMALY, a novel, easy-to-use workflow for detecting NuMTs from long-read sequencing data. The pipeline takes raw sequencing or aligned data and calls and visualises sample NuMTs. On 50 simulated datasets, the pipeline demonstrated high accuracy, with a precision of 1.000, a recall of 0.989, and an F1-score of 0.994. The pipeline underscores the limitations of short-read data in resolving and capturing complex NuMTs while demonstrating that long-read data enables their accurate identification. The Snakemake pipeline employs Python, Bash and R and is published under an open-source GNU GPL v3 license. Detailed information on setting up and running the pipeline, along with the source code, is available at https://github.com/Nirmal2310/ANOMALY.

  • New
  • Research Article
  • 10.1093/nargab/lqag009
Robin: an advanced tool for comparative loop caller result analysis leveraging large language models.
  • Mar 1, 2026
  • NAR genomics and bioinformatics
  • H M A Mohit Chowdhury + 2 more

There has been significant interest in genomics research, leading to the development of numerous new methods. One notable area of progress is in chromosome loop detection algorithms (also known as loop callers). However, despite these advancements, there is no available platform to analyze, compare, or benchmark current tools' results on the go. Developing such a platform is crucial to accelerating research and ensuring the reliability and effectiveness of new methods in this field. Hence, in this work, we propose Robin, an advanced, ready-to-go platform for comparative loop caller result analysis that leverages large language models (LLMs). Robin is a web server designed to analyze loop caller results, offering a comprehensive range of analysis metrics, such as recovery and overlap. It is tightly integrated with HiGlass for interactive, multi-resolution visualization of Hi-C matrices and loop annotations, allowing users to visually inspect and validate loop structures in a genomic context. Additionally, Robin incorporates LLM capabilities that enable users to generate customized plots and figures simply by providing natural language instructions. Overall, Robin is a robust and comprehensive loop caller result analysis and visualization tool. It is publicly accessible at http://hicrobin.online, with comprehensive documentation available at http://documentation.hicrobin.online/.

  • New
  • Research Article
  • 10.1093/nargab/lqag023
Recurrence plot reconstruction reveals chromosomal reorganization before territory formation.
  • Mar 1, 2026
  • NAR genomics and bioinformatics
  • Yuki Kitanishi + 3 more

Chromatin conformation capture methods such as Hi-C have improved understanding of nuclear architecture. However, reconstruction from single-cell Hi-C (scHi-C) data is challenging due to limited DNA contacts per cell. We have previously developed the recurrence plot-based reconstruction (RPR) method for reconstructing three-dimensional (3D) genomic structure from Hi-C data even from low-coverage DNA contact information. Here we used the RPR method to analyze scHi-C data derived from early-stage F1 hybrid embryos as a proof-of-concept for understanding of global chromosomal architecture. We found that paternal and maternal chromosomes become gradually intermingled from the 1-cell to the 64-cell stage, and that discrete chromosome territories are largely established between 8-cell and 64-cell stages. We also observed Rabl-like polarization of chromosomes from the 2- to 8-cell stage, which was mostly dissolved by the 64-cell stage. We also noted transient rod-like extension and parallel chromosome alignment at the 4-cell stage. These findings indicate dynamic chromosomal reorganization before territory formation. RPR and scHi-C together capture 3D chromosomal architecture of individual cells during early embryogenesis.

  • New
  • Open Access Icon
  • Research Article
  • 10.1093/nargab/lqag017
FragmentFinder—a user-friendly, Windows-based tool for identifying and characterizing short RNAs excised from any noncoding RNA
  • Feb 16, 2026
  • NAR Genomics and Bioinformatics
  • Jeffrey D Demeis + 6 more

Noncoding RNAs <200 nucleotides (nt) in length are referred to as short noncoding RNAs (sncRNAs) and include microRNAs (miRNAs), piwi-interacting RNAs, small nucleolar RNAs, transfer RNAs, etc. One striking example of the regulatory capabilities of sncRNAs comes from a group of small yet potent RNAs called miRNAs. MiRNAs are ∼20-nt RNAs excised from longer pre-miRNA hairpins, and to date, thousands of miRNAs have been identified across an array of species with specific roles for miRNAs defined in virtually every cellular activity (e.g. growth, differentiation, apoptosis, and disease). Importantly, studies aimed at evaluating the transcriptomic changes of miRNAs have now revealed the existence of miRNA-like fragments derived from other types of sncRNAs and suggest similar regulatory capacities may be associated with these novel sncRNA fragments. Unfortunately, many biologically relevant sncRNA-excised fragments remain uncharacterized due to their routine exclusion during initial miRNA characterizations as “sncRNA degradation products” as well as nearly all sncRNA informatic analyses continuing to solely assess annotated miRNA expressions. To address this, several platforms aimed at identifying novel sncRNA fragments have recently been developed. That said, the principal analytical tools currently employed to characterize novel sncRNA fragments often require significant computational expertise hindering their widespread utilization. As such, the development of a user-friendly platform, requiring minimal programming experience yet capable of identifying and characterizing RNA fragments excised from any sncRNA from any species is highly desirable and potentially impactful. In light of this, we have developed FragmentFinder—an intuitive, Windows-executable resource designed to require absolutely no computational background and capable of accurately characterizing all (annotated and unknown) sncRNA-derived RNAs within a raw small RNA sequencing file in real time.

  • New
  • Open Access Icon
  • Research Article
  • 10.1093/nargab/lqag021
DoBSeqWF: a framework for sensitive detection of individual genetic variation in pooled sequencing data
  • Feb 16, 2026
  • NAR Genomics and Bioinformatics
  • Mads Cort Nielsen + 11 more

Population screening for rare genetic diseases has the potential to increase early diagnosis and treatment, but the high cost of next-generation sequencing limits widespread implementation. Double-batched sequencing (DoBSeq) is a cost-effective method that uses two-dimensional overlapping pool sequencing to enable individual-level rare variant detection. However, the resulting high-depth, complex data require a specialized workflow for efficient, sensitive, and reproducible analysis. We developed DoBSeqWF (DoBSeq Workflow), a Nextflow-based pipeline that processes pooled sequencing data from alignment through variant calling, filtering, and final variant assignment. Using a childhood cancer cohort of 200 individuals with whole genome sequencing as a reference, we created training and validation datasets, benchmarked multiple variant callers, and implemented machine learning filters to improve rare variant detection while maintaining high sensitivity. DoBSeqWF demonstrates accurate and scalable rare variant detection within the evaluated experimental setting and provides a promising avenue for future cost-effective genetic screening programmes.

  • New
  • Open Access Icon
  • Research Article
  • 10.1093/nargab/lqag016
Accelerating rare disease diagnostics by linking DNA and RNA through an explainable and interactive RNA-guided workflow
  • Feb 11, 2026
  • NAR Genomics and Bioinformatics
  • Willem T K Maassen + 10 more

Challenges preventing mainstream use of RNA-sequencing (RNA-seq) in genome diagnostics are sources of biological and technical variation, typically caused by intrinsic differences in gene expression between tissue types, cellular conditions, and environmental factors. While machine learning methods may partially correct unwanted variation, interpreting RNA-seq data that are typically generated by different sources over time, which is a realistic scenario in healthcare, remains challenging and complex. We developed a complete RNA-guided workflow that handles such variation and is therefore able to identify gene–disease associations in the context of genomic, phenotypic, and segregation analysis of rare disease patients. The result is a streamlined implementation of OUTRIDER and FRASER, complemented with Borzoi and MOLGENIS VIP. This novel workflow paves the way for pinpointing rare variants affecting gene expression and splicing using self-contained interactive reports visualizing outlier genes and prioritized patient-level variants for immediate clinical interpretation. We analysed 144 cases from different centres, a realistic cohort for centres more likely to be dependent on background cohorts. We demonstrate that RNA outlier analysis enhances variant interpretation and, despite its limitations, is already able to aid clinical variant interpretation. Our workflow accelerates the prioritization of coding and non-coding variants, and the reclassification of clinically relevant variants of unknown significance.

  • New
  • Open Access Icon
  • Research Article
  • 10.1093/nargab/lqag012
An ELIXIR scoping review on domain-specific evaluation metrics for synthetic data in life sciences
  • Feb 11, 2026
  • NAR Genomics and Bioinformatics
  • Styliani-Christina Fragkouli + 13 more

Synthetic data (SD) has become an increasingly important asset in the life sciences, helping address data scarcity, privacy concerns, and barriers to data access. Creating artificial datasets that mirror the characteristics of real data allows researchers to develop and validate computational methods in controlled environments. Despite its promise, the adoption of SD in life sciences hinges on rigorous evaluation metrics designed to assess their fidelity and reliability. To explore the current landscape of SD evaluation metrics in distinct life sciences domains, the ELIXIR Machine Learning Focus Group performed a systematic review of the scientific literature following the PRISMA guidelines. Six critical domains were examined to identify current practices for assessing SD. Findings reveal that, while generation methods are rapidly evolving, systematic evaluation is often overlooked, limiting researchers’ ability to compare, validate, and trust synthetic datasets across different domains. This systematic review underscores the urgent need for robust, standardized evaluation approaches that not only bolster confidence in SD but also guide its effective and responsible implementation. By laying the groundwork for establishing domain-specific yet interoperable standards, this scoping review paves the way for future initiatives aimed at enhancing the role of SD in scientific discovery, clinical practice and beyond.

  • New
  • Open Access Icon
  • Research Article
  • 10.1093/nargab/lqag013
A comprehensive annotation of conserved protein domains in human endogenous retroviruses
  • Feb 9, 2026
  • NAR Genomics and Bioinformatics
  • Tomàs Montserrat-Ayuso + 2 more

Human endogenous retroviruses (HERVs) occupy nearly 8% of the human genome, yet their protein-coding potential remains largely unexplored. Originating from ancestral retroviruses that infected germline cells, HERVs typically follow the canonical proviral structure LTR–gag–pol–env–LTR, where gag, pol, and env encode structural, enzymatic, and envelope proteins. We present a comprehensive resource annotating conserved retroviral domains across 120 000 + ORFs derived from internal HERV regions. Using a reproducible pipeline based on HMMER and InterProScan, we identified over 17 000 domain hits—primarily from pol genes such as reverse transcriptase, RNase H, and protease—and quantified their structural conservation. Hundreds of domains exceed 95% alignment coverage, revealing a surprising abundance of full-length retrovirus-like domains in both young and ancient families. The HERVK (HML-2) subfamily retains the most complete polyprotein architecture, including 13 loci with nearly intact Gag, Pol, and Env, but full-length Pol domains are also found in HERVH, HERVW, and HERVE. Our annotations recover conserved catalytic motifs in Pol and transmembrane features in Env, enabling fine-grained functional interpretation. All results—including BED, FASTA, domain sequences, InterProScan outputs, and transmembrane predictions—are provided as an open resource at Zenodo to support downstream analyses of HERV protein expression, immune modulation, and co-option in health and disease.