Allocation of soil profiles to one or another soil class depends on the surveyor’s experience and knowledge. Inconsistent class designations can affect the downstream uses of soil information such as hydrological and ecological applications. Therefore, numerical classifications can be useful in creating property-based soil clusters. Quantitative classification of soil horizons can also be beneficial for guiding physicochemical thresholds used in taxonomic criteria. This study aimed at taking a step in applying numerical classification techniques to the South African soil legacy database to quantitatively cluster horizons. For each master horizon, the database was clustered into new “diagnostic” horizons through Partitioning Around Medoids (PAM) implemented in the Clustering Large Applications algorithm (CLARA). This algorithm has the benefit of using multiple subsets of the data making a more reliable initiation of medoids and it is a non-parametric approach making it more robust to outliers. To determine the optimal number of clusters, the silhouette width for 1 through 11 clusters was calculated. The number of clusters with the largest silhouette width was taken as the optimal number of clusters. The results of this study show that for O, E, and G (gleyed) horizons, the algorithm performs relatively well with the first two principle components capturing almost half of the variation in the data. However, for the A, B, and C horizons, the algorithm struggled to separate clusters sufficiently. The A horizon clusters only broadly corresponded to environmental factors (geography, climate, elevation, and geology) and the large clusters straddled numerous taxonomic thresholds. Moreover, the B horizon clusters correspond relatively well with environmental factors. Limitations identified, include bias in geographic location of modal soil profiles of the database, bias in the South African classification system, how to handle non-globular clusters, and outliers which add leverage to the clusters.