Lithium, a rare metal of strategic importance, has garnered heightened global attention. This investigation delves into the laboratory visible-near infrared and short-wavelength infrared reflectance (VNIR-SWIR 350 nm–2500 nm) spectral properties of lithium-rich rocks and stream sediments, aiming to elucidate their quantitative relationship with lithium concentration. This research seeks to pave new avenues and furnish innovative technical solutions for probing sedimentary lithium reserves. Conducted in the Tuanjie Peak region of Western Kunlun, Xinjiang, China, this study analyzed 614 stream sediments and 222 rock specimens. Initial steps included laboratory VNIR-SWIR spectral reflectance measurements and lithium quantification. Following the preprocessing of spectral data via Savitzky-Golay (SG) smoothing and continuum removal (CR), the absorption positions (Pos2210nm, Pos1910nm) and depths (Depth2210, Depth1910) in the rock spectra, as well as the Illite Spectral Maturity (ISM) of the rock samples, were extracted. Employing both the Successive Projections Algorithm (SPA) and genetic algorithm (GA), wavelengths indicative of lithium content were identified. Integrating the lithium-sensitive wavelengths identified by these feature selection methods, A quantitative predictive regression model for lithium content in rock and stream sediments was developed using partial least squares regression (PLSR), support vector regression (SVR), and convolutional neural network (CNN). Spectral analysis indicated that lithium is predominantly found in montmorillonite and illite, with its content positively correlating with the spectral maturity of illite and closely related to Al-OH absorption depth (Depth2210) and clay content. The SPA algorithm was more effective than GA in extracting lithium-sensitive bands. The optimal regression model for quantitative prediction of lithium content in rock samples was SG-SPA-CNN, with a correlation coefficient prediction (Rp) of 0.924 and root-mean-square error prediction (RMSEP) of 0.112. The optimal model for the prediction of lithium content in stream sediment was SG-SPA-CNN, with an Rp and RMSEP of 0.881 and 0.296, respectively. The higher prediction accuracy for lithium content in rocks compared to sediments indicates that rocks are a more suitable medium for predicting lithium content. Compared to the PLSR and SVR models, the CNN model performs better in both sample types. Despite the limitations, this study highlights the effectiveness of hyperspectral technology in exploring the potential of clay-type lithium resources in the Tuanjie Peak area, offering new perspectives and approaches for further exploration.