Fusing three-dimensional (3D) and multispectral (MS) imaging data holds promise for high-throughput, comprehensive plant phenotyping to decipher genome-to-phenome knowledge. The acquisition of high-quality 3D multispectral point clouds (3DMPCs) of plants remains challenging because of poor 3D data quality and limited radiometric calibration methods for plants with complex canopy structure. We proposed a novel 3D spatial-spectral data fusion approach to collect high-quality 3DMPCs of plants by integrating the next-best-view (NBV) planning for adaptive data acquisition and Neural REference Field (NeREF) for radiometric calibration. Our approach was used to acquire 3DMPCs of perilla, tomato and rapeseed plants with diverse plant architecture and leaf morphological features and evaluated by the accuracy of chlorophyll content and equivalent water thickness (EWT) estimation. Results showed that the completeness of plant point clouds collected by our approach was improved by an average of 23.6% compared with the fixed viewpoints alone. The NeREF-based radiometric calibration with the hemispherical reference outperformed the conventional calibration method by reducing the root mean square error (RMSE) of 58.93% for extracted reflectance spectra. The RMSE for chlorophyll content and EWT predictions decreased by 21.25% and 14.13% using the partial least squares regression (PLSR) with the generated 3DMPCs. Our study provided an effective and efficient way to collect high-quality 3DMPCs of plants under the natural light condition, which improves the accuracy and comprehensiveness of phenotyping plant morphological and physiological traits and facilitates plant biology and genetic studies and breeding programs.