Sample-Based Distance-Approximation for Subsequence-Freeness

Omer Cohen Sidon,Dana Ron

doi:10.1007/s00453-024-01233-4

Abstract

AbstractIn this work, we study the problem of approximating the distance to subsequence-freeness in the sample-based distribution-free model. For a given subsequence (word) $$w = w_1 \ldots w_k$$ w = w 1 … w k , a sequence (text) $$T = t_1 \ldots t_n$$ T = t 1 … t n is said to contain w if there exist indices $$1 \le i_1< \cdots < i_k \le n$$ 1 ≤ i 1 < ⋯ < i k ≤ n such that $$t_{i_{j}} = w_j$$ t i j = w j for every $$1 \le j \le k$$ 1 ≤ j ≤ k . Otherwise, T is w-free. Ron and Rosin (ACM Trans Comput Theory 14(4):1–31, 2022) showed that the number of samples both necessary and sufficient for one-sided error testing of subsequence-freeness in the sample-based distribution-free model is $$\Theta (k/\epsilon )$$ Θ ( k / ϵ ) . Denoting by $$\Delta (T,w,p)$$ Δ ( T , w , p ) the distance of T to w-freeness under a distribution $$p:[n]\rightarrow [0,1]$$ p : [ n ] → [ 0 , 1 ] , we are interested in obtaining an estimate $$\widehat{\Delta }$$ Δ ^ , such that $$|\widehat{\Delta }- \Delta (T,w,p)| \le \delta $$ | Δ ^ - Δ ( T , w , p ) | ≤ δ with probability at least 2/3, for a given error parameter $$\delta $$ δ . Our main result is a sample-based distribution-free algorithm whose sample complexity is $$\tilde{O}(k^2/\delta ^2)$$ O ~ ( k 2 / δ 2 ) . We first present an algorithm that works when the underlying distribution p is uniform, and then show how it can be modified to work for any (unknown) distribution p. We also show that a quadratic dependence on $$1/\delta $$ 1 / δ is necessary.

Full Text