The 32-bit floating-point (FP32) binary format, commonly used for data representation in computers, introduces high complexity, requiring powerful and expensive hardware for data processing and high energy consumption, hence being unsuitable for implementation on sensor nodes, edge devices, and other devices with limited hardware resources. Therefore, it is often necessary to use binary formats of lower complexity than FP32. This paper proposes the usage of the 24-bit fixed-point format that will reduce the complexity in two ways, by decreasing the number of bits and by the fact that the fixed-point format has significantly less complexity than the floating-point format. The paper optimizes the 24-bit fixed-point format and examines its performance for data with the Laplacian distribution, exploiting the analogy between fixed-point binary representation and uniform quantization. Firstly, the optimization of the 24-bit uniform quantizer is performed by deriving two new closed-form formulas for a very accurate calculation of its maximal amplitude. Then, the 24-bit fixed-point format is optimized by optimization of its key parameter and by proposing two adaptation procedures, with the aim to obtain the same performance as of the optimal uniform quantizer in a wide range of variance of input data. It is shown that the proposed 24-bit fixed-point format achieves for 18.425 dB higher performance than the floating-point format with the same number of bits while being less complex.
Read full abstract