Soil aggregate size distribution (ASD) is a key soil property for quantitative characterization of soil structure. Typically, dry and wet sieving methods are followed to measure mass fractions of soil aggregates under each aggregate size class and their cumulative fractions are used to estimate ASD parameters. In general, available soil ASD databases are small and a comprehensive assessment of existing models to describe ASD data is not available for soil aggregates. With objectives to comprehensively assess available ASD models, parameterize each model through pedotransfer functions (PTF), and generate ASD parameter maps for large region, we evaluated 26 models using 1373 soil ASD datasets generated from four large Indian states. The lognormal distribution-based Buchan model was observed to best describe most of our datasets. In fact, 1305 ASD datasets met the threshold criteria consisting of ‘coefficient of determination (R2) > 0.95′ and ‘corrected Akaike information criterion (AICc) < − 45′ for the Buchan model. Once ASD models were chosen, we used both soil properties and terrain attributes as predictors in the random forest (RF) and extreme gradient boosting models to develop PTFs for their parameters. To check whether soil properties estimated through digital soil mapping (DSM) can replace measured soil properties in the PTFs, we extracted soil properties from available DSM products. With measured soil properties and terrain attributes as predictors, RF-based PTFs yielded R2 values in the range of 0.55–0.76 for ASD parameters. Comparable performance was observed when measured soil properties were replaced by the DSM-estimated soil properties. Such a result enabled us to create national digital soil maps for ASD parameters based on DSM-derived soil properties for the first time.