Sorption by soil is the fundamental basis for environment fate of hydrophobic organic contaminants (HOCs), which varies significantly depending on diverse properties of soils. Therefore, a generalized approach to predict HOC sorption by soils is required. In this study, 488 data points were extracted from references and adopted to develop models for estimating the sorption capacities of phenanthrene in soils using six different machine learning (ML) approaches. The extreme gradient boosting (XGBT) model demonstrated the most favorable performance, achieving a coefficient of determination of 0.91 and root-mean-square errors of 0.24 for the testing dataset. The XGBT model's performance was further demonstrated by comparing with experimental data from batch sorption tests conducted on 20 soil samples collected from 17 provinces of China. The differences between the predicted values and the experimental values were statistically equal to zero (p = 0.14). Leveraging the XBGT model together with soil properties from the Harmonized World Soil Database, the distribution of sorption capacities in Chinese soils was successfully depicted on a national scale. This research is expected to contribute to a deeper understanding of the migration of persistent organic pollutants in terrestrial system. Furthermore, the established model holds implications for more precise and scientific soil environmental management.