Abstract

There are two reasons to have an efficient algorithm for identifying all right-maximal Lyndon substrings of a string: firstly, Bannai et al. introduced in 2015 a linear algorithm to compute all runs of a string that relies on knowing all right-maximal Lyndon substrings of the input string, and secondly, Franek et al. showed in 2017 a linear equivalence of sorting suffixes and sorting right-maximal Lyndon substrings of a string, inspired by a novel suffix sorting algorithm of Baier. In 2016, Franek et al. presented a brief overview of algorithms for computing the Lyndon array that encodes the knowledge of right-maximal Lyndon substrings of the input string. Among those presented were two well-known algorithms for computing the Lyndon array: a quadratic in-place algorithm based on the iterated Duval algorithm for Lyndon factorization and a linear algorithmic scheme based on linear suffix sorting, computing the inverse suffix array, and applying to it the next smaller value algorithm. Duval’s algorithm works for strings over any ordered alphabet, while for linear suffix sorting, a constant or an integer alphabet is required. The authors at that time were not aware of Baier’s algorithm. In 2017, our research group proposed a novel algorithm for the Lyndon array. Though the proposed algorithm is linear in the average case and has O(nlog(n)) worst-case complexity, it is interesting as it emulates the fast Fourier algorithm’s recursive approach and introduces τ-reduction, which might be of independent interest. In 2018, we presented a linear algorithm to compute the Lyndon array of a string inspired by Phase I of Baier’s algorithm for suffix sorting. This paper presents the theoretical analysis of these two algorithms and provides empirical comparisons of both of their C++ implementations with respect to the iterated Duval algorithm.

Highlights

  • In combinatorics on words, Lyndon words play a very important role

  • The introduction of Baier’s suffix sort in 2015 and the consequent realization of the connection to right-maximal Lyndon substrings brought up the realization that there was an elementary algorithm to compute the Lyndon array, and that, despite its original clumsiness, could be eventually refined to outperform any SSLA or BWLA implementation: any implementation of a suffix sorting-based scheme requires a full suffix sort and some additional processing, while Baier’s approach is “just” a partial suffix sort; see [23]

  • When the input string is such that the missing values of the incomplete Lyndon array of the input string can be computed in linear time, the overall execution of the algorithm is linear as well, and the average case complexity will be shown to be linear in the length of the input string

Read more

Summary

Introduction

Lyndon words play a very important role. Lyndon words, a special case of Hall words, were named after Roger Lyndon, who was looking for a suitable description of the generators of free Lie algebras [1]. They presented an algorithm to compute all the runs in a string in linear time that requires the knowledge of all right-maximal Lyndon substrings of the input string with respect to an order of the alphabet and its inverse. The introduction of Baier’s suffix sort in 2015 and the consequent realization of the connection to right-maximal Lyndon substrings brought up the realization that there was an elementary (not relying on a pre-processed global data structure such as a suffix array or a Burrows–Wheeler transform) algorithm to compute the Lyndon array, and that, despite its original clumsiness, could be eventually refined to outperform any SSLA or BWLA implementation: any implementation of a suffix sorting-based scheme requires a full suffix sort and some additional processing, while Baier’s approach is “just” a partial suffix sort; see [23]. The results of the empirical measurements of the performance of IDLA, TRLA, and BSLA on those datasets are presented in Section 7 in both tabular and graphical forms

Basic Notation and Terminology
Properties Preserved by τ -Reduction
The Complexity of TRLA
The Algorithm BSLA
Notation and Basic Notions of BSLA
The Refinement
Motivation for the Refinement
The Complexity of BSLA
Data and Measurements
Conclusions and Future Work
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call