A New Method of Gene Information Vectorization and its Application in Similarity Search

Qian Dong,Zhaogong Zhang,Zhiliang Zhang,Dayuan Zheng

doi:10.1088/1742-6596/1453/1/012071

Qian Dong, Zhaogong Zhang + Show 2 more

Open Access

https://doi.org/10.1088/1742-6596/1453/1/012071

Copy DOI

Abstract

With the development of the Human Genome Project, more and more biological sequence data are generated, and the analysis and processing of these sequence data have promoted the development of bioinformatics. Sequence similarity analysis is the basis of bioinformatics, through which we can use the information of known sequences to study the structure, function and evolutionary relationship of unknown new sequences. This paper performs data compression and retrieval on the genome database based on the dbSNP information of DNA. According to the rule of determining a protein by three bases, the amino acid characters are determined, and the redundant information is removed by using the dbSNP information. It is the first time to propose the construction of a new compressed form of biological sequence structure, which can reflect the strong correlation between the SNP location information and SNP in each sample in the genome. Finally, this paper constructs a complete biological sequence approximate neighbor query system, which can not only greatly reduce the storage and computing overhead, but also improve the query efficiency under the condition of ensuring the retrieval accuracy. The accuracy and scalability of this method are verified by experiments on a large data set of gene database.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A New Method of Gene Information Vectorization and its Application in Similarity Search

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Journal: Journal of Physics: Conference Series	Publication Date: Jan 1, 2020
License type: cc-by

Similar Papers

Pretraining model for biological sequence data.
Bosheng Song ... Xiangzheng Fu
Briefings in Functional Genomics | VOL. 20
Bosheng Song, et. al.Bosheng Song ... Xiangzheng Fu
28 May 2021
Briefings in Functional Genomics | VOL. 20

Biological Sequence Classification: A Review on Data and General Methods.
Chunyan Ao ... Quan Zou
Research (Washington, D.C.) | VOL. 2022
Chunyan Ao, et. al.Chunyan Ao ... Quan Zou
01 Jan 2021
Research (Washington, D.C.) | VOL. 2022

DeepBIO: an automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation and visualization analysis.
Ruheng Wang ... Haoqing Yu
Nucleic Acids Research | VOL. 51
Ruheng Wang, et. al.Ruheng Wang ... Haoqing Yu
17 Feb 2023
Nucleic Acids Research | VOL. 51

Representation learning applications in biological sequence analysis
Hitoshi Iuchi ... Michiaki Hamada
Computational and Structural Biotechnology Journal | VOL. 19
Hitoshi Iuchi, et. al.Hitoshi Iuchi ... Michiaki Hamada
01 Jan 2020
Computational and Structural Biotechnology Journal | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A New Method of Gene Information Vectorization and its Application in Similarity Search

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series