Current radio interferometers output multi-petabyte-scale volumes of data per year, making the storage, transfer, and processing of these data a sizeable challenge. This challenge is expected to grow with next-generation telescopes such as the Square Kilometre Array (SKA), which will produce a considerably larger data volume than current instruments. Lossy compression of interferometric data post-correlation can abate this challenge, but any drawbacks from the compression should be well understood in advance. Lossy data compression reduces the precision of data, introducing additional noise. Since high-redshift (e.g., cosmic dawn or epoch of reionization) 21 cm studies impose strict precision requirements, the impact of this effect on the 21 cm signal power spectrum statistic is investigated in a bid to rule out unwanted systematics. We applied dysco visibility compression, a technique for normalizing and quantizing specifically designed for radio interferometric data, to observed visibilities datasets from the LOFAR telescope as well as simulated ones. The power spectrum of these data was analyzed, and we establish the level of the compression noise in the power spectrum in comparison to the thermal noise. We also examined its coherency behavior by employing the cross-coherence metric. Finally, for optimal compression results, we compared the compression noise obtained from different compression settings to a nominal 21 cm signal power. From a single night of observation, we find that the noise introduced due to the compression is more than five orders of magnitude lower than the thermal noise level in the power spectrum. The noise does not affect calibration. Furthermore, the noise remains subdominant to the noise introduced by the nonlinear calibration algorithm used following random parameter initialization across different runs. The compression noise shows no correlation with the sky signal and has no measurable coherent component, therefore averaging down optimally with the integration of more data. The level of compression error in the power spectrum ultimately depends on the compression settings. dysco visibility compression is found to be an insignificant concern for 21 cm power spectrum studies. Hence, data volumes can be safely reduced by factors of $ 4$ with insignificant bias to the final power spectrum. Data from SKA-Low will likely be compressible by the same factor as data from LOFAR owing to the similarities of the two instruments. The same technique can be used to compress data from other telescopes, but a small adjustment of the compression parameters might be required.
Read full abstract