Particle size distribution (PSD) is an important index property of granular materials. The conventional mechanical sieve analysis for PSD determination is labor-intensive, time-consuming, and inefficient. With the development of computer vision, image-based gradation prediction methods gradually emerge. However, previous studies mainly focused on images where individual aggregate particles are separated from each other without overlapping (i.e., not densely packed together), thus limiting their applicability. This study proposed a novel framework to predict the PSD of densely packed coarse aggregate particles, which is based on visual foundation model and machine learning with appropriate feature engineering. First, the segment anything model, which is one of the visual foundation models, was used for the instance segmentation of the densely packed aggregate particles. Then, the feature engineering was implemented with the segmentation result. The major quantitative indicators related to particle sizes were extracted from each mask, such as the area, the diameter of minimum enclosing circle, and the major axes of the equivalent ellipse of particle contours. Subsequently, the cumulative distribution curves of these indicators were plotted, followed by the data sampling operation on these curves to generate the feature vectors. The artificial neural network (ANN) model was then trained for gradation prediction with each of the feature vectors and the data points extracted from PSD curves used as the inputs and data labels, respectively. The model-training dataset consisted of 8160 different sets of digitally generated particle samples with known PSDs. Finally, verification test results of additional five different groups of digital particle samples and real coarse aggregate samples with known PSDs show that the root mean squared error (RMSE) of predicted aggregate sizes is 1.79 mm and 1.86 mm for digital and real aggregate particle samples, respectively. Therefore, the efficacy and feasibility of the proposed method was validated in terms of predicting the PSD of densely packed coarse aggregates from two-dimensional image information. As compared to the Mask RCNN combined with empirical formula of which the RMSE values were 5.12 mm and 5.84 mm for digital and real aggregate particle samples, respectively, the proposed framework demonstrates improved particle segmentation applicability and better PSD prediction accuracy.