A new statistic for efficient detection of repetitive sequences.

Sijie Chen,Xuegong Zhang,Michael S Waterman,Yixin Chen,Fengzhu Sun

doi:10.1093/bioinformatics/btz262

Abstract

Detecting sequences containing repetitive regions is a basic bioinformatics task with many applications. Several methods have been developed for various types of repeat detection tasks. An efficient generic method for detecting most types of repetitive sequences is still desirable. Inspired by the excellent properties and successful applications of the D2 family of statistics in comparative analyses of genomic sequences, we developed a new statistic D2R that can efficiently discriminate sequences with or without repetitive regions. Using the statistic, we developed an algorithm of linear time and space complexity for detecting most types of repetitive sequences in multiple scenarios, including finding candidate clustered regularly interspaced short palindromic repeats regions from bacterial genomic or metagenomics sequences. Simulation and real data experiments show that the method works well on both assembled sequences and unassembled short reads. The codes are available at https://github.com/XuegongLab/D2R_codes under GPL 3.0 license. Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics (Oxford, England)	Publication Date: Apr 16, 2019
Citations: 4	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

A new statistic for efficient detection of repetitive sequences.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)

Lead the way for us

Similar Papers

A family of small repeated elements with some transposon-like properties in the genome of Neisseria gonorrhoeae.
F F Correia ... M Inouye
Journal of Biological Chemistry | VOL. 263
F F Correia, et. al.F F Correia ... M Inouye
01 Sep 1988
Journal of Biological Chemistry | VOL. 263

Repetitive sequences upstream of the pfg27/25 gene determine polymorphism in laboratory and natural lines of Plasmodium falciparum
Pina Sallicandro ... Pietro Alano
Molecular & Biochemical Parasitology | VOL. 110
Pina Sallicandro, et. al.Pina Sallicandro ... Pietro Alano
01 Oct 2000
Molecular & Biochemical Parasitology | VOL. 110

TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads.
Petr Novák ... Andrea Koblížková
Nucleic Acids Research | VOL. 45
Petr Novák, et. al.Petr Novák ... Andrea Koblížková
10 Apr 2017
Nucleic Acids Research | VOL. 45

Specific differentiation between Mycobacterium bovis BCG and virulent strains of the Mycobacterium tuberculosis complex.
Juana Magdalena ... Camille Locht
Journal of Clinical Microbiology | VOL. 36
Juana Magdalena, et. al.Juana Magdalena ... Camille Locht
01 Sep 1998
Journal of Clinical Microbiology | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new statistic for efficient detection of repetitive sequences.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)