Explicit construction of codes correcting a single reverse-complement duplication of arbitrary length
Explicit construction of codes correcting a single reverse-complement duplication of arbitrary length
- Conference Article
75
- 10.1145/509907.510023
- May 19, 2002
We present an explicit construction of linear-time encodable and decodable codes of rate r which can correct a fraction (1 —re)/2 of errors over an alphabet of constant size depending only on e, for every 0 r 0. The error-correction performance of these codes is optimal as seen by the Singleton bound (these are codes). Such near-MDS linear-time codes were known for the decoding from erasures [2]; our construction generalizes this to handle errors as well. Concatenating these codes with good, constant-sized binary codes gives a construction of linear-time binary codes which meet the so-called Zyablov bound. In a nutshell, our results match the performance of the previously known explicit constructions of codes that had polynomial time encoding and decoding, but in addition have linear time encoding and decoding algorithms.We also obtain some results for list decoding targeted at the situation when the fraction of errors is very large, namely (1—e) for an arbitrarily small constant e > 0. The previously known constructions of such codes of good rate over constant-sized alphabets either used algebraic-geometric codes and thus suffered from complicated constructions and slow decoding, or as in the recent work of the authors [9], had fast encoding/decoding, but suffered from an alphabet size that was exponential in 1/e. We present two constructions of such codes with rate close to Ω(e2) over an alphabet of size quasi-polynomial in 1/e. One of the constructions, at the expense of a slight worsening of the rate, can achieve an alphabet size which is polynomial in 1/e. It also yields constructions of codes for list decoding from erasures which achieve new trade-offs. In particular, we construct codes of rate close to the optimal Ω(e) rate which can be efficiently list decoded from a fraction (1—e) of erasures.
- Research Article
107
- 10.1109/tit.2005.855587
- Oct 1, 2005
- IEEE Transactions on Information Theory
We present an explicit construction of linear-time encodable and decodable codes of rate r which can correct a fraction (1-r-/spl epsiv/)/2 of errors over an alphabet of constant size depending only on /spl epsiv/, for every 0<r<1 and arbitrarily small /spl epsiv/>0. The error-correction performance of these codes is optimal as seen by the Singleton bound (these are "near-MDS" codes). Such near-MDS linear-time codes were known for the decoding from erasures; our construction generalizes this to handle errors as well. Concatenating these codes with good, constant-sized binary codes gives a construction of linear-time binary codes which meet the Zyablov bound, and also the more general Blokh-Zyablov bound (by resorting to multilevel concatenation). Our work also yields linear-time encodable/decodable codes which match Forney's error exponent for concatenated codes for communication over the binary symmetric channel. The encoding/decoding complexity was quadratic in Forney's result, and Forney's bound has remained the best constructive error exponent for almost 40 years now. In summary, our results match the performance of the previously known explicit constructions of codes that had polynomial time encoding and decoding, but in addition have linear-time encoding and decoding algorithms.
- Conference Article
3
- 10.4230/lipics.approx-random.2016.45
- Jan 1, 2016
A stochastic code is a pair of encoding and decoding procedures where Encoding procedure receives a k bit message m, and a d bit uniform string S. The code is (p,L)-list-decodable against a class C of functions from n bits to n bits, if for every message m and every channel C in C that induces at most $pn$ errors, applying decoding on the received word C(Enc(m,S)) produces a list of at most L messages that contain m with high probability (over the choice of uniform S). Note that both the channel C and the decoding algorithm Dec do not receive the random variable S. The rate of a code is the ratio between the message length and the encoding length, and a code is explicit if Enc, Dec run in time poly(n). Guruswami and Smith (J. ACM, to appear), showed that for every constants 0 1 there are Monte-Carlo explicit constructions of stochastic codes with rate R >= 1-H(p)-epsilon that are (p,L=poly(1/epsilon))-list decodable for size n^c channels. Monte-Carlo, means that the encoding and decoding need to share a public uniformly chosen poly(n^c) bit string Y, and the constructed stochastic code is (p,L)-list decodable with high probability over the choice of Y. Guruswami and Smith pose an open problem to give fully explicit (that is not Monte-Carlo) explicit codes with the same parameters, under hardness assumptions. In this paper we resolve this open problem, using a minimal assumption: the existence of poly-time computable pseudorandom generators for small circuits, which follows from standard complexity assumptions by Impagliazzo and Wigderson (STOC 97). Guruswami and Smith also asked to give a fully explicit unconditional constructions with the same parameters against O(log n)-space online channels. (These are channels that have space O(log n) and are allowed to read the input codeword in one pass). We resolve this open problem. Finally, we consider a tighter notion of explicitness, in which the running time of encoding and list-decoding algorithms does not increase, when increasing the complexity of the channel. We give explicit constructions (with rate approaching 1-H(p) for every p 0) for channels that are circuits of size 2^{n^{Omega(1/d)}} and depth d. Here, the running time of encoding and decoding is a fixed polynomial (that does not depend on d). Our approach builds on the machinery developed by Guruswami and Smith, replacing some probabilistic arguments with explicit constructions. We also present a simplified and general approach that makes the reductions in the proof more efficient, so that we can handle weak classes of channels.
- Research Article
9
- 10.1007/s00037-020-00203-w
- Jan 20, 2021
- computational complexity
A stochastic code is a pair of encoding and decoding procedures (Enc, Dec) where \({{\rm Enc} : \{0, 1\}^{k} \times \{0, 1\}^{d} \rightarrow \{0, 1\}^{n}}\). The code is (p, L)-list decodable against a class \(\mathcal{C}\) of “channel functions” \(C : \{0,1\}^{n} \rightarrow \{0,1\}^{n}\) if for every message \(m \in \{0,1\}^{k}\) and every channel \(C \in \mathcal{C}\) that induces at most pn errors, applying Dec on the “received word” C(Enc(m,S)) produces a list of at most L messages that contain m with high probability over the choice of uniform \(S \leftarrow \{0, 1\}^{d}\). Note that both the channel C and the decoding algorithm Dec do not receive the random variable S, when attempting to decode. The rate of a code is \(R = k/n\), and a code is explicit if Enc, Dec run in time poly(n).Guruswami and Smith (Journal of the ACM, 2016) showed that for every constants \(0 < p < \frac{1}{2}, \epsilon > 0\) and \(c > 1\) there exist a constant L and a Monte Carlo explicit constructions of stochastic codes with rate \(R \geq 1-H(p) - \epsilon\) that are (p, L)-list decodable for size \(n^c\) channels. Here, Monte Carlo means that the encoding and decoding need to share a public uniformly chosen \({\rm poly}(n^c)\) bit string Y, and the constructed stochastic code is (p, L)-list decodable with high probability over the choice of Y.Guruswami and Smith pose an open problem to give fully explicit (that is not Monte Carlo) explicit codes with the same parameters, under hardness assumptions. In this paper, we resolve this open problem, using a minimal assumption: the existence of poly-time computable pseudorandom generators for small circuits, which follows from standard complexity assumptions by Impagliazzo and Wigderson (STOC 97).Guruswami and Smith also asked to give a fully explicit unconditional constructions with the same parameters against \(O(\log n)\)-space online channels. (These are channels that have space \(O(\log n)\) and are allowed to read the input codeword in one pass.) We also resolve this open problem.Finally, we consider a tighter notion of explicitness, in which the running time of encoding and list-decoding algorithms does not increase, when increasing the complexity of the channel. We give explicit constructions (with rate approaching \(1 - H(p)\) for every \(p \leq p_{0}\) for some \(p_{0} >0\) ) for channels that are circuits of size \(2^{n^{\Omega(1/d)}}\) and depth d. Here, the running time of encoding and decoding is a polynomial that does not depend on the depth of the circuit.Our approach builds on the machinery developed by Guruswami and Smith, replacing some probabilistic arguments with explicit constructions. We also present a simplified and general approach that makes the reductions in the proof more efficient, so that we can handle weak classes of channels.
- Research Article
25
- 10.1109/tit.2011.2136170
- Jun 1, 2011
- IEEE Transactions on Information Theory
Explicit code constructions for multiple-input multiple-output (MIMO) multiple-access channels (MAC) with K users are presented in this paper. The first construction is dedicated to the case of symmetric MIMO-MAC where all the users have the same number of transmit antennas nt and transmit at the same level of per-user multiplexing gain r. Furthermore, we assume that the users transmit in an independent fashion and do not cooperate. The construction is systematic for any values of K, nt and r. It is proved that this newly proposed construction achieves the optimal MIMO-MAC diversity-multiplexing gain tradeoff (DMT) provided by Tse at high-SNR regime. In the second part of the paper we take a further step to investigate the MAC-DMT of a general MIMO-MAC where the users are allowed to have different numbers of transmit antennas and can transmit at different levels of multiplexing gain. The exact optimal MAC-DMT of such channel is explicitly characterized in this paper. Interestingly, in the general MAC-DMT, some users might not be able to achieve their single-user DMT performance as in the symmetric case, even when the multiplexing gains of the other users are close to 0. Detailed explanations of such unexpected result are provided in this paper. Finally, by generalizing the code construction for the symmetric MIMO-MAC, explicit code constructions are provided for the general MIMO-MAC and are proved to be optimal in terms of the general MAC-DMT.
- Conference Article
5
- 10.1109/isit.2017.8007052
- Jun 1, 2017
Write-once memory (WOM) is a storage device consisting of binary cells which can only increase their levels. A t-write WOM code is a coding scheme which allows to write t times to the WOM without decreasing the levels of the cells. The sum-rate of a WOM code is the ratio between the total number of bits written to the memory and the number of cells. It is known that the maximum sum-rate of a t-write WOM code is log(t + 1). This is also an achievable upper bound both by information theory arguments and explicit WOM code constructions. While existing constructions of WOM codes were targeted to increase the sum-rate, we consider here two more figures of merit in evaluating the constructions. The first one is the complexity of the encoding and decoding maps of the code. The second one is called the convergence rate, and is defined to be the minimum code length n(e) in order to reach e close to a point in the capacity region. One of our main results in the paper is a specific capacity achieving construction for two-write WOM codes which has polynomial complexity and relatively short block length to be e close to the capacity. Using these two-write WOM codes, we obtain three-write WOM codes that approach sum-rate 1.809 with relatively short block lengths. Finally, we provide another construction of three-write WOM that achieves sum-rate 1.71 by using only 100 cells.
- Conference Article
4
- 10.1109/focs.2019.00028
- Nov 1, 2019
We consider codes for space bounded channels. This is a model for communication under noise that was studied by Guruswami and Smith (J. ACM 2016) and lies between the Shannon (random) and Hamming (adversarial) models. In this model, a channel is a space bounded procedure that reads the codeword in one pass, and modifies at most a p fraction of the bits of the codeword. Guruswami and Smith, and later work by Shaltiel and Silbak (RANDOM 2016), gave constructions of listdecodable codes with rate approaching 1 - H(p) against channels with space s = clog n, with encoding/decoding time poly(2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">s</sup> ) = poly(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">c</sup> ). In this paper we show that for every constant 0 <; p <; and every sufficientl small constant ε > 0, there are codes with rate R ≥ 1 - H(p) - ε, list size poly(1/ε), and furthermore: . Our codes can handle channels with space s = n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Ω(1)</sup> , which is much larger than O(log n) achieved by previous work. . We give encoding and decoding algorithms that run in time n · polylog(n). Previous work achieved large and unspecified poly(n) time (even for space s = 1 · log n channels). . We can handle space bounded channels that read the codeword in any order, whereas previous work considered channels that read the codeword in the standard order. Our construction builds on the machinery of Guruswami and Smith (with some key modifications) replacing some nonconstructive codes and pseudorandom objects (that are found in exponential time by brute force) with efficient explicit constructions. For this purpose we exploit recent results of Haramaty, Lee and Viola (SICOMP 2018) on pseudorandom properties of "t-wise independence + low weight noise" which we quantitatively improve using techniques by Forbes and Kelly (FOCS 2018). To make use of such distributions, we give new explicit constructions of binary linear codes that have dual distance of n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Ω(1)</sup> , and are also polynomial time list-decodable from relative distance á1/2-ε, with list size poly(1/ε). To the best of our knowledge, no such construction was previously known. Somewhat surprisingly, we show that Reed-Solomon codes with dimension k <; √n, have this property if interpreted as binary codes (in some specific interpretation)which we term: "Raw Reed-Solomon Codes". A key idea is viewing Reed-Solomon codes as "bundles" of certain dualBCH codewords.
- Research Article
6
- 10.1109/tit.2019.2946483
- Oct 18, 2019
- IEEE Transactions on Information Theory
Write-once memory (WOM) is a storage device consisting of binary cells that can only increase their levels. A $t$ -write WOM code is a coding scheme that makes it possible to write $t$ times to a WOM without decreasing the levels of any of the cells. The sum-rate of a WOM code is the ratio between the total number of bits written to the memory during the $t$ writes and the number of cells. It is known that the maximum possible sum-rate of a $t$ -write WOM code is $\log (t+1)$ . This is also an achievable upper bound, both by information-theoretic arguments and through explicit constructions. While existing constructions of WOM codes are targeted at the sum-rate, we consider here two more figures of merit. The first one is the complexity of the encoding and decoding maps. The second figure of merit is the convergence rate , defined as the minimum code length $n(\delta)$ required to reach a point that is $\delta $ -close to the capacity region. One of our main results in this paper is a capacity-achieving construction of two-write WOM codes which has polynomial encoding/decoding complexity while the block length $n(\delta)$ required to be $\delta $ -close to capacity is significantly smaller than existing constructions. Using these two-write WOM codes, we then obtain three-write WOM codes that approach a sum-rate of 1.809 at relatively short block lengths. We also provide several explicit constructions of finite length three-write WOM codes; in particular, we achieve a sum-rate of 1.716 by using only 93 cells. Finally, we modify our two-write WOM codes to construct $\epsilon $ -error WOM codes of high rates and small probability of failure.
- Research Article
5
- 10.1109/twc.2008.060606
- Feb 1, 2008
- IEEE Transactions on Wireless Communications
The design of space-time codes for frequency flat, spatially correlated MIMO fading channels is considered. The focus of the paper is on the class of space-time block codes known as linear dispersion (LD) codes, introduced by Hassibi and Hochwald. The LD codes are optimized with respect to the mutual information between the inputs to the space-time encoder and the output of the channel. The use of the mutual information as both a design criterion and a performance measure is justified by allowing soft decisions at the output of the space-time decoder. A spatial Fourier (virtual) representation of the channel is exploited to allow for the analysis of MIMO channels with quite general fading statistics. Conditions, known as generalized orthogonal conditions (GOC's), are derived for an LD code to achieve an upper bound on the mutual information, with the understanding that LD codes that achieve the upper bound, if they exist, are optimal. Explicit code constructions and properties of the optimal power allocation schemes are also derived. In particular, it is shown that optimal LD codes correspond to beamforming to a single virtual transmit angle at low SNR, and a necessary and sufficient condition for beamforming to be optimal is provided. Due to the nature of the code construction, it is further observed that the optimal LD codes can be designed to adapt to the statistics of different scattering environments. Finally, numerical results are provided to illustrate the optimal code design for three examples of sparse scattering environments. The performance of the optimal LD codes for these scattering environments is compared with that of LD codes designed assuming the i.i.d. Rayleigh fading (rich scattering) model, and it is shown that the optimal LD codes perform significantly better. The optimal LD codes are also compared to beamforming LD codes and it is shown that beamforming is nearly optimal over a range of SNR's of interest.
- Conference Article
87
- 10.1109/isit.2012.6283041
- Jul 1, 2012
MDS codes are erasure-correcting codes that can correct the maximum number of erasures given the number of redundancy or parity symbols. If an MDS code has r parities and no more than r erasures occur, then by transmitting all the remaining data in the code one can recover the original information. However, it was shown that in order to recover a single symbol erasure, only a fraction of 1/r of the information needs to be transmitted. This fraction is called the repair bandwidth (fraction). Explicit code constructions were given in previous works. If we view each symbol in the code as a vector or a column, then the code forms a 2D array and such codes are especially widely used in storage systems. In this paper, we ask the following question: given the length of the column l, can we construct high-rate MDS array codes with optimal repair bandwidth of 1/r, whose code length is as long as possible? In this paper, we give code constructions such that the code length is (r + l)log <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">r</sub> l.
- Conference Article
6
- 10.1109/allerton.2015.7447097
- Sep 1, 2015
Regeneration codes with exact-repair property for distributed storage systems is studied in this paper. For exact-repair problem, the achievable points of (α,β) tradeoff match with the outer bound only for minimum storage regenerating (MSR), minimum bandwidth regenerating (MBR), and some specific values of n, k, and d. Such tradeoff is characterized in this work for general (n, k, k), (i.e., k = d) for some range of per-node storage (α) and repair-bandwidth (β). Rather than explicit code construction, achievability of these tradeoff points is shown by proving existence of exact-repair regeneration codes for any (n, k, k). More precisely, it is shown that an (n, k, k) system can be extended by adding a new node, which is randomly picked from some ensemble, and it is proved that, with high probability, the existing nodes together with the newly added one maintain properties of exact-repair regeneration codes. The new achievable region improves upon the existing code constructions. In particular, this result provides a complete tradeoff characterization for an (n, 3, 3) distributed storage system for any value of n.
- Conference Article
316
- 10.1109/allerton.2009.5394538
- Sep 1, 2009
Erasure coding techniques are used to increase the reliability of distributed storage systems while minimizing storage overhead. Also of interest is minimization of the bandwidth required to repair the system following a node failure. In a recent paper, Wu et al. characterize the tradeoff between the repair bandwidth and the amount of data stored per node. They also prove the existence of regenerating codes that achieve this tradeoff. In this paper, we introduce Exact Regenerating Codes, which are regenerating codes possessing the additional property of being able to duplicate the data stored at a failed node. Such codes require low processing and communication overheads, making the system practical and easy to maintain. Explicit construction of exact regenerating codes is provided for the minimum bandwidth point on the storage-repair bandwidth tradeoff, relevant to distributed-mail-server applications. A subspace based approach is provided and shown to yield necessary and sufficient conditions on a linear code to possess the exact regeneration property as well as prove the uniqueness of our construction. Also included in the paper, is an explicit construction of regenerating codes for the minimum storage point for parameters relevant to storage in peer-to-peer systems. This construction supports a variable number of nodes and can handle multiple, simultaneous node failures. All constructions given in the paper are of low complexity, requiring low field size in particular.
- Research Article
10
- 10.3390/e19070364
- Jul 15, 2017
- Entropy
This paper investigates polar codes for the additive white Gaussian noise (AWGN) channel. The scaling exponent $\mu$ of polar codes for a memoryless channel $q_{Y|X}$ with capacity $I(q_{Y|X})$ characterizes the closest gap between the capacity and non-asymptotic achievable rates in the following way: For a fixed $\varepsilon \in (0, 1)$, the gap between the capacity $I(q_{Y|X})$ and the maximum non-asymptotic rate $R_n^*$ achieved by a length-$n$ polar code with average error probability $\varepsilon$ scales as $n^{-1/\mu}$, i.e., $I(q_{Y|X})-R_n^* = \Theta(n^{-1/\mu})$. It is well known that the scaling exponent $\mu$ for any binary-input memoryless channel (BMC) with $I(q_{Y|X})\in(0,1)$ is bounded above by $4.714$, which was shown by an explicit construction of polar codes. Our main result shows that $4.714$ remains to be a valid upper bound on the scaling exponent for the AWGN channel. Our proof technique involves the following two ideas: (i) The capacity of the AWGN channel can be achieved within a gap of $O(n^{-1/\mu}\sqrt{\log n})$ by using an input alphabet consisting of $n$ constellations and restricting the input distribution to be uniform; (ii) The capacity of a multiple access channel (MAC) with an input alphabet consisting of $n$ constellations can be achieved within a gap of $O(n^{-1/\mu}\log n)$ by using a superposition of $\log n$ binary-input polar codes. In addition, we investigate the performance of polar codes in the moderate deviations regime where both the gap to capacity and the error probability vanish as $n$ grows. An explicit construction of polar codes is proposed to obey a certain tradeoff between the gap to capacity and the decay rate of the error probability for the AWGN channel.
- Conference Article
6
- 10.1109/itw46852.2021.9457648
- Apr 11, 2021
We consider the rack-aware storage system where n= n̅u nodes are organized in n̅ racks each containing u nodes, and any k = k̅u+u <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> (0 ≤ u <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> <; u) nodes can retrieve the original data file. More importantly, the cross-rack communication cost is much more expensive than the intra-rack communication cost, so that the latter is usually neglected in the system bandwidth. The MSRR (minimum storage rack-aware regenerating) code is an important variation of regenerating codes that achieves the optimal repair bandwidth for single node failures in the rackaware model. However, explicit construction of MSRR codes for all parameters were not developed until Chen&Barg's work. In this paper we present another explicit construction of MSRR codes for all parameters that improve Chen&Barg's construction in two aspects: (1) The sub-packetization is reduced from (d̅-k̅+ 1) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n̅</sup> to (d̅ - k̅+ 1) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">⌈n̅/u-u^-0⌉</sup> ⌉ where d̅ is the number of helper racks that participate in the repair process; (2) The field size is reduced to |F|>n which is almost half of the field used in Chen&Barg's construction. Besides, our code keeps the same access level as Chen&Barg's low-access construction.
- Conference Article
2
- 10.1109/dasc/picom/datacom/cyberscitec.2018.00066
- Aug 1, 2018
An explicit construction of systematic MDS codes, called HashTag+ codes, with arbitrary sub-packetization level for all-node repair is proposed. It is shown that even for small sub-packetization levels, HashTag+ codes achieve the optimal MSR point for repair of any parity node, while the repair bandwidth for a single systematic node depends on the sub-packetization level. Compared to other codes in the literature, HashTag+ codes provide from 20% to 40% savings in the average amount of data accessed and transferred during repair.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.