A high-performance communication receiver desires a sufficient and accurate recognition of transmitted sources. In this paper, we propose a sequential decoding algorithm for the robust reception of sources with natural redundancy (NR) over the AWGN channel. To fully exploit the abundant NR in the exemplified English text sources, a causal language modeling (CLM) with a powerful Transformer decoder neural network (NN) structure is adopted for modeling the joint probability distribution. For versatility of byte-level tokenization, the UTF-8 encoding scheme is considered for the Wikipedia corpus, namely the English edition of the Wiki-40B dataset. The tree search-based M-algorithm (MA) is integrated with CLM, denoted as CLM-MA algorithm, to synthesize the a priori probability of information sequence and likelihood values of polluted symbols. Both simulation experiments and hardware platform evaluations are conducted to analyze the performance of the CLM-MA algorithm. With an adequately large M value, tremendous performance profit is achievable, especially for high-efficiency modulation levels and extremely noisy conditions. This mechanism of adopting the pre-trained NN model merely desires self-supervised learning architecture and eliminates the requirement of high-cost and accurate labels, which paves a new way for communication receiver design.
Read full abstract