SAMtools Research Articles

Abstract Study question To assess whether there is a relationship between mitochondrial DNA content and implantation result. Summary answer The embryos with a higher mitochondrial DNA ratio increase pregnancy rate and implantation rate in single euploid embryo transfer. What is known already Mitochondria is an important organelle that generates energy during embryonic development. Recent literature points out that mitochondrial content and function may be related to implantation success and embryo viability. Some studies have linked increased ratios of mitochondrial DNA to aneuploidy, advanced maternal age, and euploid blastocyst with implantation failure, while others have failed to demonstrate similar findings. Study design, size, duration This study is a retrospective cohort study from 2016 to 2019, including 1465 single embryo transfer cycles. Participants/materials, setting, methods The involved embryos were biopsied on Day 5 or 6 and the mitochondrial DNA ratio of 1465 embryos was examined undergoing PGS/NGS. The mitochondrial DNA ratios were normalized for technical batch-to-batch variation. The mitochondrial DNA ratio between the implantation group and non-implantation group was statistically analyzed. Data were analyzed by the student’s t-test for continuous variables and Chi-square test for categorical variables. Main results and the role of chance The mitochondrial DNA ratio of embryos was no significant difference between different age spans ( p = 0.772) and ploidy (p = 0.224). D5 biopsied embryos, however, contained a significantly higher mitochondrial DNA ratio than D6 biopsied embryos (p &lt; 0.0001). All of the single embryo transferred embryos were classified into two groups; implanted and non-implanted embryos. Results from 1465 transferred embryos show that the mitochondrial DNA ratio of implanted embryos was statistically significantly higher than non-implanted embryos (p = 0.0053). Besides, the cut-off values were established, dividing the transferred embryos into high and low mitochondrial DNA ratio groups. The pregnancy rate and implantation rate of the high mitochondrial DNA ratio group was higher than the low mitochondrial DNA ratio group: [Pregnancy rate] 74% vs. 63.5% (p = 0.0209); [Implantation rate] 57.3% vs. 50.8% (p = 0.1907). Limitations, reasons for caution The mitochondrial DNA ratios were analyzed by bioinformatics processing in Miseq reporter software (Illumina) files in the BAM and FASTQ format. Not sure if there is reproducibility in different sequencing platforms. Wider implications of the findings There still remains a lack of clarity regarding the relationship between mitochondrial function and transfer outcome. This retrospective study links an association between increased mtDNA content and increased implantation. Trial registration number not applicable

Read full abstract

BackgroundImmense improvements in sequencing technologies enable producing large amounts of high throughput and cost effective next-generation sequencing (NGS) data. This data needs to be processed efficiently for further downstream analyses. Computing systems need this large amounts of data closer to the processor (with low latency) for fast and efficient processing. However, existing workflows depend heavily on disk storage and access, to process this data incurs huge disk I/O overheads. Previously, due to the cost, volatility and other physical constraints of DRAM memory, it was not feasible to place large amounts of working data sets in memory. However, recent developments in storage-class memory and non-volatile memory technologies have enabled computing systems to place huge data in memory to process it directly from memory to avoid disk I/O bottlenecks. To exploit the benefits of such memory systems efficiently, proper formatted data placement in memory and its high throughput access is necessary by avoiding (de)-serialization and copy overheads in between processes. For this purpose, we use the newly developed Apache Arrow, a cross-language development framework that provides language-independent columnar in-memory data format for efficient in-memory big data analytics. This allows genomics applications developed in different programming languages to communicate in-memory without having to access disk storage and avoiding (de)-serialization and copy overheads.ImplementationWe integrate Apache Arrow in-memory based Sequence Alignment/Map (SAM) format and its shared memory objects store library in widely used genomics high throughput data processing applications like BWA-MEM, Picard and GATK to allow in-memory communication between these applications. In addition, this also allows us to exploit the cache locality of tabular data and parallel processing capabilities through shared memory objects.ResultsOur implementation shows that adopting in-memory SAM representation in genomics high throughput data processing applications results in better system resource utilization, low number of memory accesses due to high cache locality exploitation and parallel scalability due to shared memory objects. Our implementation focuses on the GATK best practices recommended workflows for germline analysis on whole genome sequencing (WGS) and whole exome sequencing (WES) data sets. We compare a number of existing in-memory data placing and sharing techniques like ramDisk and Unix pipes to show how columnar in-memory data representation outperforms both. We achieve a speedup of 4.85x and 4.76x for WGS and WES data, respectively, in overall execution time of variant calling workflows. Similarly, a speedup of 1.45x and 1.27x for these data sets, respectively, is achieved, as compared to the second fastest workflow. In some individual tools, particularly in sorting, duplicates removal and base quality score recalibration the speedup is even more promising.AvailabilityThe code and scripts used in our experiments are available in both container and repository form at: https://github.com/abs-tudelft/ArrowSAM.

Read full abstract

SAMtools Research Articles

Related Topics

Articles published on SAMtools

MtDNA-Server 2: advancing mitochondrial DNA analysis through highly parallelized data processing and interactive analytics.

Technical-economic feasibility of installing a photovoltaic generation plant: a case study in a public university

Techno-economic feasibility of installing a photovoltaic power plant: a case study at a public university

GASOLINE: detecting germline and somatic structural variants from long-reads data

Sequence Alignment/Map format: a comprehensive review of approaches and applications.

A quality control portal for sequencing data deposited at the European genome-phenome archive.

Integrative Comparison of Burrows-Wheeler Transform-Based Mapping Algorithm with de Bruijn Graph for Identification of Lung/Liver Cancer-Specific Gene.

Whole Exome Sequencing Data Analysis for Detection of Breast Cancer Gene Variants and Pathway Study

O-087 Embryos with higher mitochondrial DNA ratios show better clinical outcomes in single euploid embryo transfer

Pheniqs 2.0: accurate, high-performance Bayesian decoding and confidence estimation for combinatorial barcode indexing

SomatoSim: precision simulation of somatic single nucleotide variants

Transcriptome based high-throughput SSRs and SNPs discovery in the medicinal plant Lagenaria siceraria

Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR

Recalibration of mapping quality scores in Illumina short-read alignments improves SNP detection results in low-coverage sequencing data.

Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework

IonCRAM: a reference-based compression tool for ion torrent sequence files

CONCUR: quick and robust calculation of codon usage from ribosome profiling data.

Identifying suitable tools for variant detection and differential gene expression using RNA-seq data

Novel bioinformatics quality control metric for next-generation sequencing experiments in the clinical context.

Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

SAMtools Research Articles

Related Topics

Articles published on SAMtools

MtDNA-Server 2: advancing mitochondrial DNA analysis through highly parallelized data processing and interactive analytics.

Technical-economic feasibility of installing a photovoltaic generation plant: a case study in a public university

Techno-economic feasibility of installing a photovoltaic power plant: a case study at a public university

GASOLINE: detecting germline and somatic structural variants from long-reads data

Sequence Alignment/Map format: a comprehensive review of approaches and applications.

A quality control portal for sequencing data deposited at the European genome-phenome archive.

Integrative Comparison of Burrows-Wheeler Transform-Based Mapping Algorithm with de Bruijn Graph for Identification of Lung/Liver Cancer-Specific Gene.

Whole Exome Sequencing Data Analysis for Detection of Breast Cancer Gene Variants and Pathway Study

O-087 Embryos with higher mitochondrial DNA ratios show better clinical outcomes in single euploid embryo transfer

Pheniqs 2.0: accurate, high-performance Bayesian decoding and confidence estimation for combinatorial barcode indexing

SomatoSim: precision simulation of somatic single nucleotide variants

Transcriptome based high-throughput SSRs and SNPs discovery in the medicinal plant Lagenaria siceraria

Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR

Recalibration of mapping quality scores in Illumina short-read alignments improves SNP detection results in low-coverage sequencing data.

Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework

IonCRAM: a reference-based compression tool for ion torrent sequence files

CONCUR: quick and robust calculation of codon usage from ribosome profiling data.

Identifying suitable tools for variant detection and differential gene expression using RNA-seq data

Novel bioinformatics quality control metric for next-generation sequencing experiments in the clinical context.

Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data