Massive multiple–input-multiple–output has proven to deliver improvements in both spectral and transmitted energy efficiency. However, these improvements come at the cost of critical design challenges for the hardware implementation due to the huge amount of data that has to be processed immediately, especially the storage of large channel state information (CSI) matrices. This paper presents an on-chip memory system equipped with CSI which provides high area efficiency, while supporting flexible accesses and high bandwidths. Optimization across system-algorithm-hardware is used to develop hardware-friendly compression algorithms exploring propagation characteristics and large antenna-array features. More specifically, group-based and spatial-angular transform algorithms are implemented in a heterogeneous memory system, which consists of an unified memory for storing compressed CSI and a parallel memory for flexible access. Up to 75% memory can be saved for a 128-antenna system, at a less than 0.8dB performance loss. Implemented in ST 28nm FD-SOI technology, the capacity of designed system is 1.06Mb, which is equivalent to 4Mb uncompressed memory and can store 100 $128\times 10$ channel matrices. The area is 0.47 mm2, demonstrating a 58% reduction compared with a memory system without CSI compression. With a supply voltage of 1.0V, the memory system can run at 833 MHz, providing a 833Gb/s access bandwidth.