Computing Maximal Lyndon Substrings of a String

Frantisek Franek,Michael Liut

doi:10.3390/a13110294

Abstract

There are two reasons to have an efficient algorithm for identifying all right-maximal Lyndon substrings of a string: firstly, Bannai et al. introduced in 2015 a linear algorithm to compute all runs of a string that relies on knowing all right-maximal Lyndon substrings of the input string, and secondly, Franek et al. showed in 2017 a linear equivalence of sorting suffixes and sorting right-maximal Lyndon substrings of a string, inspired by a novel suffix sorting algorithm of Baier. In 2016, Franek et al. presented a brief overview of algorithms for computing the Lyndon array that encodes the knowledge of right-maximal Lyndon substrings of the input string. Among those presented were two well-known algorithms for computing the Lyndon array: a quadratic in-place algorithm based on the iterated Duval algorithm for Lyndon factorization and a linear algorithmic scheme based on linear suffix sorting, computing the inverse suffix array, and applying to it the next smaller value algorithm. Duval’s algorithm works for strings over any ordered alphabet, while for linear suffix sorting, a constant or an integer alphabet is required. The authors at that time were not aware of Baier’s algorithm. In 2017, our research group proposed a novel algorithm for the Lyndon array. Though the proposed algorithm is linear in the average case and has O(nlog(n)) worst-case complexity, it is interesting as it emulates the fast Fourier algorithm’s recursive approach and introduces τ-reduction, which might be of independent interest. In 2018, we presented a linear algorithm to compute the Lyndon array of a string inspired by Phase I of Baier’s algorithm for suffix sorting. This paper presents the theoretical analysis of these two algorithms and provides empirical comparisons of both of their C++ implementations with respect to the iterated Duval algorithm.

Highlights

In combinatorics on words, Lyndon words play a very important role
The introduction of Baier’s suffix sort in 2015 and the consequent realization of the connection to right-maximal Lyndon substrings brought up the realization that there was an elementary algorithm to compute the Lyndon array, and that, despite its original clumsiness, could be eventually refined to outperform any SSLA or BWLA implementation: any implementation of a suffix sorting-based scheme requires a full suffix sort and some additional processing, while Baier’s approach is “just” a partial suffix sort; see [23]
When the input string is such that the missing values of the incomplete Lyndon array of the input string can be computed in linear time, the overall execution of the algorithm is linear as well, and the average case complexity will be shown to be linear in the length of the input string

Summary

Introduction

Lyndon words play a very important role. Lyndon words, a special case of Hall words, were named after Roger Lyndon, who was looking for a suitable description of the generators of free Lie algebras [1]. They presented an algorithm to compute all the runs in a string in linear time that requires the knowledge of all right-maximal Lyndon substrings of the input string with respect to an order of the alphabet and its inverse. The introduction of Baier’s suffix sort in 2015 and the consequent realization of the connection to right-maximal Lyndon substrings brought up the realization that there was an elementary (not relying on a pre-processed global data structure such as a suffix array or a Burrows–Wheeler transform) algorithm to compute the Lyndon array, and that, despite its original clumsiness, could be eventually refined to outperform any SSLA or BWLA implementation: any implementation of a suffix sorting-based scheme requires a full suffix sort and some additional processing, while Baier’s approach is “just” a partial suffix sort; see [23]. The results of the empirical measurements of the performance of IDLA, TRLA, and BSLA on those datasets are presented in Section 7 in both tabular and graphical forms

Basic Notation and Terminology

Properties Preserved by τ -Reduction

The Complexity of TRLA

The Algorithm BSLA

Notation and Basic Notions of BSLA

The Refinement

Motivation for the Refinement

The Complexity of BSLA

Data and Measurements

Conclusions and Future Work

Results

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Computing Maximal Lyndon Substrings of a String

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Journal: Algorithms	Publication Date: Nov 12, 2020
License type: CC BY 4.0

Similar Papers

Antisequential Suffix Sorting for BWT-Based Data Compression
D Baron ... Y Bresler
IEEE Transactions on Computers | VOL. 54
D Baron, et. al.D Baron ... Y Bresler
01 Apr 2005
IEEE Transactions on Computers | VOL. 54

Induced Sorting Suffixes in External Memory
Ge Nong ... Sheng Qing Hu
ACM transactions on information systems | VOL. 33
Ge Nong, et. al.Ge Nong ... Sheng Qing Hu
17 Feb 2015
ACM transactions on information systems | VOL. 33

Optimal In-Place Suffix Sorting
Zhize Li ... Jian Li
-
Zhize Li, et. al.Zhize Li ... Jian Li
01 Mar 2018
01 Mar 2018

Space Efficient Linear Time Construction of Suffix Arrays
Pang Ko ... Srinivas Aluru
-
Pang Ko, et. al.Pang Ko ... Srinivas Aluru
01 Jan 2003
01 Jan 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computing Maximal Lyndon Substrings of a String

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms