Mining approximate patterns with frequent locally optimal occurrences

Atsuyoshi Nakamura,Ichigaku Takigawa,Hisashi Tosaka,Mineichi Kudo,Hiroshi Mamitsuka

doi:10.1016/j.dam.2015.07.002

Atsuyoshi Nakamura, Ichigaku Takigawa + Show 3 more

Open Access

https://doi.org/10.1016/j.dam.2015.07.002

Copy DOI

Journal: Discrete Applied Mathematics	Publication Date: Jul 30, 2015
Citations: 17	License type: publisher-specific-oa

Affiliation: Hokkaido University, Kyoto University

Abstract

We consider a frequent approximate pattern mining problem, in which interspersed repetitive regions are extracted from a given string. That is, we enumerate substrings that frequently match substrings of a given string locally and optimally. For this problem, we propose a new algorithm, in which candidate patterns are generated without duplication using the suffix tree of a given string. We further define a k-gap-constrained setting, in which the number of gaps in the alignment between a pattern and an occurrence is limited to at most k. Under this setting, we present memory-efficient algorithms, particularly a candidate-based version, which runs fast enough even over human chromosome sequences with more than 10 million nucleotides. We note that our problem and algorithms for strings can be directly extended to ordered labeled trees. In our experiments we used both randomly synthesized strings, in which corrupted similar substrings are embedded, and real data of human chromosome. The synthetic data experiments show that our proposed approach extracted embedded patterns correctly and time-efficiently. In real data experiments, we examined the centers of 100 clusters computed after grouping the patterns obtained by our k-gap-constrained versions (k=0,1 and 2) and the results revealed that the regions of their occurrences coincided with around a half of the regions automatically annotated as Alu sequences by a manually curated repeat sequence database.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mining approximate patterns with frequent locally optimal occurrences

Abstract

Talk to us

Similar Papers

More From: Discrete Applied Mathematics

Lead the way for us

Similar Papers

In situ hybridization of DNA sequences in human metaphase chromosomes visualized by an indirect fluorescent immunocytochemical procedure
A.C Van Prooijen-Knegt ... M Van Der Ploeg
Experimental Cell Research | VOL. 141
A.C Van Prooijen-Knegt, et. al.A.C Van Prooijen-Knegt ... M Van Der Ploeg
01 Oct 1982
Experimental Cell Research | VOL. 141

다중 최소 임계치 기반 빈발 패턴 마이닝의 성능분석
Heungmo Ryang ... Unil Yun
Journal of Korean Society for Internet Information | VOL. 14
Heungmo Ryang, et. al.Heungmo Ryang ... Unil Yun
31 Dec 2014
Journal of Korean Society for Internet Information | VOL. 14

Closed frequent similar pattern mining: Reducing the number of frequent similar patterns without information loss
Ansel Y Rodríguez-González ... Enrique Munoz De Cote
Expert Systems With Applications | VOL. 96
Ansel Y Rodríguez-González, et. al.Ansel Y Rodríguez-González ... Enrique Munoz De Cote
09 Dec 2017
Expert Systems With Applications | VOL. 96

A Spatial–Spectral Adaptive Haze Removal Method for Visible Remote Sensing Images
Huanfeng Shen ... Quan Yuan
IEEE Transactions on Geoscience and Remote Sensing | VOL. 58
Huanfeng Shen, et. al.Huanfeng Shen ... Quan Yuan
06 Mar 2020
IEEE Transactions on Geoscience and Remote Sensing | VOL. 58

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mining approximate patterns with frequent locally optimal occurrences

Abstract

Talk to us

Similar Papers

More From: Discrete Applied Mathematics