Communication systems’ development requires service customization in aspects, such as standards, multiple-input multiple-output (MIMO) scales, and algorithms. The existing hardware designs for massive MIMO detection have difficulty in achieving both high flexibility and scalability with high hardware efficiency. This article proposes a baseband processor based on a dynamic coarse-grained reconfigurable array (CGRA) for massive MIMO detection. To efficiently support various algorithm features and requirements, three optimization techniques are proposed to achieve high flexibility and scalability. First, an on-demand matrix–vector systolic array is proposed to enable flexible and scalable matrix and vector operations, reducing memory accesses by 82%. Second, distributed multi-interaction data storage is designed for flexible data access and reusability. Finally, a continuable adaptive context information format is proposed to support different bit widths, operations, and extensions of MIMO systems, reducing context information by 67%. These techniques achieve the improvements of 1.33 $\times $ , 1.34 $\times $ , and 1.29 $\times $ in energy efficiency and 1.21 $\times $ , 1.18 $\times $ , and 1.18 $\times $ in area efficiency, evaluated by removing one technique at a time from the proposed architecture. Fabricated in a 28-nm CMOS technology, the chip achieves high flexibility and scalability in supporting various detection algorithms; various MIMO scales, such as $4\,\,\times $ 4, 32 $\times $ 32, and 128 $\times $ 8; and baseband processing tasks, such as filtering and fast Fourier transformation. When benchmarked on various detection algorithms, the processor achieves 1.64–2.92-Gb/s/W energy efficiency and 0.25–0.43-Gb/s/MG area efficiency, which are 2.78–28.54 $\times $ and 2.05–14.43 $\times $ those of state-of-the-art programmable designs, respectively. To our knowledge, this is the first flexible and scalable CGRA-based baseband processor for massive MIMO detection.