Модификация алгоритма Валианта для задачи поиска подстрок

Yuliya Alekseevna Susanina,Anna Nikitichna Yaveyn,Semyon Vyacheslavovich Grigorev

doi:10.15514/ispras-2020-32(2)-11

Yuliya Alekseevna Susanina, Anna Nikitichna Yaveyn + Show 1 more

Open Access

https://doi.org/10.15514/ispras-2020-32(2)-11

Copy DOI

Abstract

The theory of formal languages and, particularly, context-free grammars has been extensively studied and applied in different areas. For example, several approaches to the recognition and classification problems in bioinformatics are based on searching the genomic subsequences possessing some specific features which can be described by a context-free grammar. Therefore, the string-matching problem can be reduced to parsing – verification if some subsequence can be derived in this grammar. Such field of application as bioinformatics requires working with a large amount of data, so it is necessary to improve the existing parsing techniques. The most asymptotically efficient parsing algorithm that can be applied to any context-free grammar is a matrix-based algorithm proposed by Valiant. This paper aims to present Valiant’s algorithm modification, which main advantage is the possibility to divide the parsing table into successively computed layers of disjoint submatrices where each submatrix of the layer can be processed independently. Moreover, our approach is easily adapted for the string-matching problem. Our evaluation shows that the proposed modification retains all benefits of Valiant’s algorithm, especially its high performance achieved by using fast matrix multiplication methods. Also, the modified version decreases a large amount of excessive computations and accelerates the substrings searching.

Highlights

The theory of formal languages and, context-free grammars has been extensively studied and applied in different areas
Several approaches to the recognition and classification problems in bioinformatics are based on searching the genomic subsequences possessing some specific features which can be described by a context-free grammar
– verification if some subsequence can be derived in this grammar. Such field of application as bioinformatics requires working with a large amount of data, so it is necessary to improve the existing parsing techniques

Summary

Введение

Теория формальных языков активно изучается и находит широкое применение во многих областях [2], прежде всего, в информатике, для описания языков программирования. Характерные особенности вторичной структуры могут быть описаны с помощью КС-грамматики [14, 15], что позволяет свести проблему распознавания и классификации к задаче синтаксического анализа (определения принадлежности некоторой строки к языку, заданному грамматикой). В данной работе предложен алгоритм, который является модификацией алгоритма Валианта. Предложенный подход частично решает проблему поиска подстрок за счет простой остановки алгоритма после заполнения определенного слоя. Показывающие, что предложенный алгоритм не проигрывает в производительности алгоритму Валианта и может быть эффективно применен к задаче поиска подстрок. представлен алгоритм, являющийся модификацией алгоритма Валианта, легко адаптируемый к задаче поиска подстрок и позволяющий повысить использование параллельных техник, а также доказана корректность и приведена оценка. Модификация алгоритма Валианта для задачи поиска подстрок. показана применимость предложенного нами алгоритма к задаче поиска подстрок; в разд. В этом разделе мы введем основные определения и опишем алгоритм Валианта, на котором основывается предложенная в данной работе модификация

Терминология

Алгоритм Валианта

Модификация алгоритма Валианта

Задача поиска подстрок

Применение алгоритма к задаче поиска подстрок

Постановка экспериментов

Анализ результатов

Заключение

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Модификация алгоритма Валианта для задачи поиска подстрок

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS

Lead the way for us

Journal: Proceedings of the Institute for System Programming of the RAS	Publication Date: Jan 1, 2020
License type: cc-by

Similar Papers

Fast & Space-Efficient Approximations of Language Edit Distance and RNA Folding: An Amnesic Dynamic Programming Approach
Barna Saha
-
Barna SahaBarna Saha
01 Oct 2017
01 Oct 2017

Solving Selected Classification Problems in Bioinformatics Using Multilayer Neural Network Based on Multi-Valued Neurons (MLMVN)
Igor Aizenberg ... Jacek M Zurada
-
Igor Aizenberg, et. al.Igor Aizenberg ... Jacek M Zurada
01 Jan 2007
01 Jan 2007

Fast rectangular matrix multiplication and QR decomposition
Philip A Knight
Linear Algebra and Its Applications | VOL. 221
Philip A KnightPhilip A Knight
01 May 1995
Linear Algebra and Its Applications | VOL. 221

PARALLEL RECOGNITION OF HIGH DIMENSIONAL IMAGES
M Nivat ... A Saoudi
International Journal of Pattern Recognition and Artificial Intelligence | VOL. 06
M Nivat, et. al.M Nivat ... A Saoudi
01 Aug 1992
International Journal of Pattern Recognition and Artificial Intelligence | VOL. 06

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Модификация алгоритма Валианта для задачи поиска подстрок

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the Institute for System Programming of the RAS