GC Content Of DNA Research Articles

BackgroundEfficient DNA-based storage systems offer substantial capacity and longevity at reduced costs, addressing anticipated data growth. However, encoding data into DNA sequences is limited by two key constraints: 1) a maximum of h consecutive identical bases (homopolymer constraint h), and 2) a GC ratio between [0.5-cGC,0.5+cGC]\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$ [0.5 - c_{{GC}}, 0.5 + c_{{GC}} ] $$\\end{document} (GC content constraint cGC\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$c_{GC}$$\\end{document}). Sequencing or synthesis errors tend to increase when these constraints are violated.ResultsIn this research, we address a pure source coding problem in the context of DNA storage, considering both homopolymer and GC content constraints. We introduce a novel coding technique that adheres to these constraints while maintaining linear complexity for increased block lengths and achieving near-optimal rates. We demonstrate the effectiveness of the proposed method through experiments on both randomly generated data and existing files. For example, when h=4\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$h = 4$$\\end{document} and cGC=0.05\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$c_{GC} = 0.05$$\\end{document}, the rate reached 1.988, close to the theoretical limit of 1.990. The associated code can be accessed at GitHub.ConclusionWe propose a variable-to-variable-length encoding method that does not rely on concatenating short predefined sequences, which achieves near-optimal rates.

Read full abstract

A Gram-negative, aerobic, rod-shaped, non-motile, yellow-pigmented bacterium, KMM 9835T, was isolated from the sediment sample obtained from the Amur Bay of the Sea of Japan seashore, Russia. Phylogenetic analyses based on the 16S rRNA gene and whole genome sequences positioned the novel strain KMM 9835T in the genus Mariniflexile as a separate line sharing the highest 16S rRNA gene sequence similarities of 96.6% and 96.2% with Mariniflexile soesokkakense RSSK-9T and Mariniflexile fucanivorans SW5T, respectively, and similarity values of <96% to other recognized Mariniflexile species. The average nucleotide identity and digital DNA-DNA hybridization values between strain KMM 9835T and M. soesokkakense KCTC 32427T, Mariniflexile gromovii KCTC 12570T, M. fucanivorans DSM 18792T, and M. maritimum M5A1MT were 83.0%, 82.5%, 83.4%, and 78.3% and 30.7%, 29.6%, 29.5%, and 24.4%, respectively. The genomic DNA GC content of strain KMM 9835T was 32.5 mol%. The dominant menaquinone was MK-6, and the major fatty acids were iso-C15:0, iso-C15:1ω10c, and C15:0. The polar lipids of strain KMM 9835T consisted of phosphatidylethanolamine, two unidentified aminolipids, an unidentified phospholipid, and six unidentified lipids. A pan-genome analysis showed that the KMM 9835T genome encoded 753 singletons. The annotated singletons were more often related to transport protein systems (SusC), transcriptional regulators (AraC, LytTR, LacI), and enzymes (glycosylases). The KMM 9835T genome was highly enriched in CAZyme-encoding genes, the proportion of which reached 7.3%. Moreover, the KMM 9835T genome was characterized by a high abundance of CAZyme gene families (GH43, GH28, PL1, PL10, CE8, and CE12), indicating its potential to catabolize pectin. This may represent part of an adaptation strategy facilitating microbial consumption of plant polymeric substrates in aquatic environments near shorelines and freshwater sources. Based on the combination of phylogenetic and phenotypic characterization, the marine sediment strain KMM 9835T (=KCTC 92792T) represents a novel species of the genus Mariniflexile, for which the name Mariniflexile litorale sp. nov. is proposed.

Read full abstract

GC Content Of DNA Research Articles

Articles published on GC Content Of DNA

Efficient and low-complexity variable-to-variable length coding for DNA storage

Description and Whole-Genome Sequencing of Mariniflexile litorale sp. nov., Isolated from the Shallow Sediments of the Sea of Japan.

"Turbo-Charged" DNA Motors with Optimized Sequence Enable Single-Molecule Nucleic Acid Sensing.

“Turbo‐Charged” DNA Motors with Optimized Sequence Enable Single‐Molecule Nucleic Acid Sensing

Draft genome sequence of multi-drug resistant Klebsiella quasipneumoniae subsp. similipneumoniae isolated from a teaching hospital wastewater in South West, Nigeria.

Rhodoalgimonas zhirmunskyi gen. nov., sp. nov., a Marine Alphaproteobacterium Isolated from the Pacific Red Alga Ahnfeltia tobuchiensis: Phenotypic Characterization and Pan-Genome Analysis.

Increased mutation and gene conversion within human segmental duplications

Microbacterium elymi sp. nov., Isolated from the Rhizospheric Soil of Elymus tsukushiensis, a Plant Native to the Dokdo Islands, Republic of Korea.

Geoalkalibacter halelectricus SAP-1 sp. nov. possessing extracellular electron transfer and mineral-reducing capabilities from a haloalkaline environment.

Variovorax terrae sp. nov. Isolated from Soil with Potential Antioxidant Activity.

Genetic and Chemical Diversity of Edible Mushroom Pleurotus Species.

The complete chloroplast genome of Prunus tangutica (Batal.) Korsh

Characterization of the complete chloroplast genome of Astragalus galactites (Fabaceae)

Actinomyces capricornis sp. nov., isolated from the oral cavity of a Japanese serow.

A hybrid CNN-LSTM model for high resolution melting curve classification

The complete chloroplast genome sequence of medicinal plant: Peganum nigellastrum (Zygophyllaceae)

Characterization of the complete chloroplast genome of Dracocephalum heterophyllum (Lamiaceae)

Sequence determinants, function, and evolution of CpG islands.

The complete chloroplast genome sequence of medicinal plant: Dianthus chinensis (Caryophyllaceae)

Characterization of the complete chloroplast genome sequence of medicinal plant: Potentilla bifurca (Rosaceae)

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

GC Content Of DNA Research Articles

Articles published on GC Content Of DNA

Efficient and low-complexity variable-to-variable length coding for DNA storage

Description and Whole-Genome Sequencing of Mariniflexile litorale sp. nov., Isolated from the Shallow Sediments of the Sea of Japan.

"Turbo-Charged" DNA Motors with Optimized Sequence Enable Single-Molecule Nucleic Acid Sensing.

“Turbo‐Charged” DNA Motors with Optimized Sequence Enable Single‐Molecule Nucleic Acid Sensing

Draft genome sequence of multi-drug resistant Klebsiella quasipneumoniae subsp. similipneumoniae isolated from a teaching hospital wastewater in South West, Nigeria.

Rhodoalgimonas zhirmunskyi gen. nov., sp. nov., a Marine Alphaproteobacterium Isolated from the Pacific Red Alga Ahnfeltia tobuchiensis: Phenotypic Characterization and Pan-Genome Analysis.

Increased mutation and gene conversion within human segmental duplications

Microbacterium elymi sp. nov., Isolated from the Rhizospheric Soil of Elymus tsukushiensis, a Plant Native to the Dokdo Islands, Republic of Korea.

Geoalkalibacter halelectricus SAP-1 sp. nov. possessing extracellular electron transfer and mineral-reducing capabilities from a haloalkaline environment.

Variovorax terrae sp. nov. Isolated from Soil with Potential Antioxidant Activity.

Genetic and Chemical Diversity of Edible Mushroom Pleurotus Species.

The complete chloroplast genome of Prunus tangutica (Batal.) Korsh

Characterization of the complete chloroplast genome of Astragalus galactites (Fabaceae)

Actinomyces capricornis sp. nov., isolated from the oral cavity of a Japanese serow.

A hybrid CNN-LSTM model for high resolution melting curve classification

The complete chloroplast genome sequence of medicinal plant: Peganum nigellastrum (Zygophyllaceae)

Characterization of the complete chloroplast genome of Dracocephalum heterophyllum (Lamiaceae)

Sequence determinants, function, and evolution of CpG islands.

The complete chloroplast genome sequence of medicinal plant: Dianthus chinensis (Caryophyllaceae)

Characterization of the complete chloroplast genome sequence of medicinal plant: Potentilla bifurca (Rosaceae)