The MetaGens algorithm formetagenomic database lossy compression andsubject alignment.

Gustavo Henrique Cervi,Claudia Elizabeth Thompson,Cecilia Dias Flores

doi:10.1093/database/baad053

Abstract

The advancement of genetic sequencing techniques led to the production of a large volume of data. The extraction of genetic material from a sample is one of the early steps of the metagenomic study. With the evolution of the processes, the analysis of the sequenced data allowed the discovery of etiological agents and, by corollary, the diagnosis of infections. One of the biggest challenges of the technique is the huge volume of data generated with each new technology developed. To introduce an algorithm that may reduce the data volume, allowing faster DNA matching with the reference databases. Using techniques like lossy compression and substitution matrix, it is possible to match nucleotide sequences without losing the subject. This lossy compression explores the nature of DNA mutations, insertions and deletions and the possibility that different sequences are the same subject. The algorithm can reduce the overall size of the database to 15% of the original size. Depending on parameters, it may reduce up to 5% of the original size. Although is the same as the other platforms, the match algorithm is more sensible because it ignores the transitions and transversions, resulting in a faster way to obtain the diagnostic results. The first experiment results in an increase in speed 10 times faster than Blast while maintaining high sensitivity. This performance gain can be extended by combining other techniques already used in other studies, such as hash tables. Database URL https://github.com/ghc4/metagens.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The MetaGens algorithm formetagenomic database lossy compression andsubject alignment.

Abstract

Talk to us

Similar Papers

More From: Database : the journal of biological databases and curation

Lead the way for us

Journal: Database : the journal of biological databases and curation	Publication Date: Aug 11, 2023
License type: CC BY 4.0

Similar Papers

Lossy Data Compression and the Community Earth System Model
Allison H Baker ... Dorit M Hammerling
-
Allison H Baker, et. al.Allison H Baker ... Dorit M Hammerling
28 Mar 2022
28 Mar 2022

Protein structure comparison using bipartite graph matching and its application to protein structure classification.
William R Taylor
Molecular & cellular proteomics : MCP | VOL. 1
William R TaylorWilliam R Taylor
04 Mar 2002
Molecular & cellular proteomics : MCP | VOL. 1

Evaluating lossy data compression on climate simulation data within a large ensemble
Allison H Baker ... Martin B Stolpe
Geoscientific Model Development | VOL. 9
Allison H Baker, et. al.Allison H Baker ... Martin B Stolpe
07 Dec 2016
Geoscientific Model Development | VOL. 9

Lossy image compression based on prediction error and vector quantisation
Mohamed Uvaze Ahamed Ayoobkhan ... Eswaran Chikkannan
EURASIP Journal on Image and Video Processing | VOL. 2017
Mohamed Uvaze Ahamed Ayoobkhan, et. al.Mohamed Uvaze Ahamed Ayoobkhan ... Eswaran Chikkannan
18 May 2017
EURASIP Journal on Image and Video Processing | VOL. 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The MetaGens algorithm formetagenomic database lossy compression andsubject alignment.

Abstract

Talk to us

Similar Papers

More From: Database : the journal of biological databases and curation