5,688 publications found
Sort by
ESICCC as a systematic computational framework for evaluation, selection, and integration of cell-cell communication inference methods.

Cell-cell communication (CCC) is critical for determining cell fates and functions in multicellular organisms. With the advent of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST), an increasing number of CCC inference methods have been developed. Nevertheless, a thorough comparison of their performance is yet to be conducted. To fill the gap, we developed a systematic benchmark framework called ESICCC to evaluate 18 ligand-receptor (LR) inference methods and 5 ligand/receptor-targets inference methods using a total of 116 datasets, including 15 ST datasets, 15 sets of cell line perturbation data, two sets of cell type-specific expression/proteomics data and 84 sets of sampled or unsampled scRNA-seq data. We evaluated and compared the agreement, accuracy, robustness, and usability of these methods. Regarding accuracy evaluation, RNAMagnet, CellChat, and scSeqComm emerge as the top three best-performing methods for intercellular ligand-receptor inference based on scRNA-seq data, while stMLnet and HoloNet are the best methods for predicting ligand/receptor-target regulations using ST data. To facilitate the practical applications, we provide a decision-tree-style guideline for users to easily choose best tools for their specific research concerns in CCC inference, and develop an ensemble pipeline CCCbank that enables versatile combinations of methods and databases. Moreover, our comparative results also uncover several critical influential factors for CCC inference, such as prior interaction information, ligand-receptor scoring algorithm, intracellular signaling complexity, and spatial relationship, which may be considered in the future studies to advance the development of new methodologies.

Open Access
Relevant
An organism-wide ATAC-seq peak catalogue for the bovine and its use to identify regulatory variants.

We herein report the generation of an organism-wide catalogue of 976,813 cis-acting regulatory elements for the bovine detected by the Assay for Transposase Accessible Chromatin using sequencing (ATAC-Seq). We regroup these regulatory elements in 16 components by nonnegative matrix factorization. Correlations between the genome-wide density of peaks and transcription start sites, between peak accessibility and expression of neighboring genes, and enrichment in transcription factor binding motifs supports their regulatory potential. Using a previously established catalogue of 12,736,643 variants, we show that the proportion of single nucleotide polymorphisms mapping to ATAC-seq peaks is higher than expected and that this is due to an ~ 1.3-fold higher mutation rate within than outside peaks. Their site frequency spectrum indicates that variants in ATAC-seq peaks are subject to purifying selection. We generate eQTL datasets for liver and blood and show that variants that drive eQTL fall into liver and blood-specific ATAC-seq peaks more often than expected by chance. We combine ATAC-seq and eQTL data to estimate that the proportion of regulatory variants mapping to ATAC-seq peaks is approximately 1 in 3, and that the proportion of variants mapping to ATAC-seq peaks that are regulatory is approximately 1 in 25. We discuss the implication of these findings on the utility of ATAC-seq information to improve the accuracy of genomic selection.

Relevant
600+ insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges.

Repetitive elements (REs) are integral to the composition, structure, and function of eukaryotic genomes, yet remain understudied in most taxonomic groups. We investigated REs across 601 insect species and report wide variation in REs dynamics across groups. Analysis of associations between REs and protein-coding genes revealed dynamic evolution at the interface between REs and coding regions across insects, including notably elevated RE-gene associations in lineages with abundant long interspersed nuclear elements (LINEs). We leveraged this large, empirical data set to quantify impacts of long-read technology on RE detection and investigate fundamental challenges to RE annotation in diverse groups. In long-read assemblies we detected ~36% more REs than short-read assemblies, with long terminal repeats (LTRs) showing 162% increased detection, while DNA transposons and LINEs showed less respective technology-related bias. In most insect lineages, 25-85% of repetitive sequences were unclassified; following automated annotation, compared to only ~13% in Drosophila species. Although the diversity of available insect genomes has rapidly expanded, we show the rate of community contributions to RE databases has not kept pace, preventing efficient annotation and high-resolution study of REs in most groups. We highlight the tremendous opportunity and need for the biodiversity genomics field to embrace REs and suggest collective steps for making progress towards this goal.

Relevant
Nucleosome repositioning in chronic lymphocytic leukaemia.

The location of nucleosomes in the human genome determines the primary chromatin structure and regulates access to regulatory regions. However, genome-wide information on deregulated nucleosome occupancy and its implications in primary cancer cells is scarce. Here, we conducted a genome-wide comparison of high-resolution nucleosome maps in peripheral-blood B cells from patients with chronic lymphocytic leukaemia (CLL) and healthy individuals at single base pair resolution. Our investigation uncovered significant changes of nucleosome positioning in CLL. Globally, the spacing between nucleosomes - the nucleosome repeat length (NRL) - was shortened in CLL. This effect was stronger in the more aggressive IGHV-unmutated than in the IGHV-mutated CLL subtype. Changes in nucleosome occupancy at specific sites were linked to active chromatin remodelling and reduced DNA methylation. Nucleosomes lost or gained in CLL marked differential binding of 3D chromatin organisers such as CTCF as well as immune response-related transcription factors and delineated mechanisms of epigenetic deregulation. The principal component analysis of nucleosome occupancy in cancer-specific regions allowed classification of samples between cancer subtypes and normal controls. Furthermore, patients could be better assigned to CLL subtypes according to differential nucleosome occupancy than based on DNA methylation or gene expression. Thus, nucleosome positioning constitutes a novel readout to dissect molecular mechanisms of disease progression and to stratify patients. Furthermore, we anticipate that the global nucleosome repositioning detected in our study, such as changes in the NRL, can be exploited for liquid biopsy applications based on cell-free DNA to stratify patients and monitor disease progression.

Open Access
Relevant
Deciphering D4Z4 CpG methylation gradients in fascioscapulohumeral muscular dystrophy using nanopore sequencing.

Fascioscapulohumeral muscular dystrophy (FSHD) is caused by a unique genetic mechanism that relies on contraction and hypomethylation of the D4Z4 macrosatellite array on the Chromosome 4q telomere allowing ectopic expression of the DUX4 gene in skeletal muscle. Genetic analysis is difficult because of the large size and repetitive nature of the array, a nearly identical array on the 10q telomere, and the presence of divergent D4Z4 arrays scattered throughout the genome. Here, we combine nanopore long-read sequencing with Cas9-targeted enrichment of 4q and 10q D4Z4 arrays for comprehensive genetic analysis including determination of the length of the 4q and 10q D4Z4 arrays with base-pair resolution. In the same assay, we differentiate 4q from 10q telomeric sequences, determine A/B haplotype, identify paralogous D4Z4 sequences elsewhere in the genome, and estimate methylation for all CpGs in the array. Asymmetric, length-dependent methylation gradients were observed in the 4q and 10q D4Z4 arrays that reach a hypermethylation point at approximately 10 D4Z4 repeat units, consistent with the known threshold of pathogenic D4Z4 contractions. High resolution analysis of individual D4Z4 repeat methylation revealed areas of low methylation near the CTCF/insulator region and areas of high methylation immediately preceding the DUX4 transcriptional start site. Within the DUX4 exons, we observed a waxing/waning methylation pattern with a 180-nucleotide periodicity, consistent with phased nucleosomes. Targeted nanopore sequencing complements recently developed molecular combing and optical mapping approaches to genetic analysis for FSHD by adding precision of the length measurement, base-pair resolution sequencing, and quantitative methylation analysis.

Open Access
Relevant
An 11-point time course midgut transcriptome across 72 h after bloodfeeding provides detailed temporal resolution of transcript expression in the arbovirus vector, Aedes aegypti.

As the major vector for dengue, Zika, yellow fever, and chikungunya viruses, the mosquito Aedes aegypti is one of the most important insects in public health. These viruses are transmitted by bloodfeeding, which is also necessary for the reproduction of the mosquito. Thus, the midgut plays an essential role in mosquito physiology as the center for bloodmeal digestion and as an organ that serves as the first line of defense against viruses. Despite its importance, transcriptomic dynamics with fine temporal resolution across the entire digestion cycle have not yet been reported. To fill this gap, we conducted a transcriptomic analysis of A. aegypti female midguts across a 72-h bloodmeal digestion cycle for 11 time points, with a particular focus on the first 24 h. PCA analysis confirmed that 72 h is indeed a complete digestion cycle. Cluster and GO enrichment analysis showed the orchestrated modulation of thousands of genes to accomplish the midgut's role as the center for digestion, as well as nutrient transport with a clear progression with sequential emphasis on transcription, translation, energy production, nutrient metabolism, transport, and finally, autophagy by 24-36 h. We further determined that many serine proteases are robustly expressed as if to prepare for unexpected physiological challenges. This study provides a powerful resource for the analysis of genomic features that coordinate the rapid and complex transcriptional program induced by mosquito bloodfeeding.

Open Access
Relevant
Locus-resolution analysis of L1 regulation and retrotransposition potential in mouse embryonic development.

Mice harbor ∼2800 intact copies of the retrotransposon Long Interspersed Element 1 (L1). The in vivo retrotransposition capacity of an L1 copy is defined by both its sequence integrity and epigenetic status, including DNA methylation of the monomeric units constituting young mouse L1 promoters. Locus-specific L1 methylation dynamics during development may therefore elucidate and explain spatiotemporal niches of endogenous retrotransposition but remain unresolved. Here, we interrogate the retrotransposition efficiency and epigenetic fate of source (donor) L1s, identified as mobile in vivo. We show that promoter monomer loss consistently attenuates the relative retrotransposition potential of their offspring (daughter) L1 insertions. We also observe that most donor/daughter L1 pairs are efficiently methylated upon differentiation in vivo and in vitro. We use Oxford Nanopore Technologies (ONT) long-read sequencing to resolve L1 methylation genome-wide and at individual L1 loci, revealing a distinctive "smile" pattern in methylation levels across the L1 promoter region. Using Pacific Biosciences (PacBio) SMRT sequencing of L1 5' RACE products, we then examine DNA methylation dynamics at the mouse L1 promoter in parallel with transcription start site (TSS) distribution at locus-specific resolution. Together, our results offer a novel perspective on the interplay between epigenetic repression, L1 evolution, and genome stability.

Open Access
Relevant
Comprehensive isoform-level analysis reveals the contribution of alternative isoforms to venom evolution and repertoire diversity.

Animal venom systems have emerged as valuable models for investigating how novel polygenic phenotypes may arise from gene evolution by varying molecular mechanisms. However, a significant portion of venom genes produce alternative mRNA isoforms that have not been extensively characterized, hindering a comprehensive understanding of venom biology. In this study, we present a full-length isoform-level profiling workflow integrating multiple RNA sequencing technologies, allowing us to reconstruct a high-resolution transcriptome landscape of venom genes in the parasitoid wasp Pteromalus puparum Our findings demonstrate that more than half of the venom genes generate multiple isoforms within the venom gland. Through mass spectrometry analysis, we confirm that alternative splicing contributes to the diversity of venom proteins, acting as a mechanism for expanding the venom repertoire. Notably, we identified seven venom genes that exhibit distinct isoform usages between the venom gland and other tissues. Furthermore, evolutionary analyses of venom serpin3 and orcokinin further reveal that the co-option of an ancient isoform and a newly evolved isoform, respectively, contributes to venom recruitment, providing valuable insights into the genetic mechanisms driving venom evolution in parasitoid wasps. Overall, our study presents a comprehensive investigation of venom genes at the isoform level, significantly advancing our understanding of alternative isoforms in venom diversity and evolution and setting the stage for further in-depth research on venoms.

Relevant