Papillary thyroid carcinoma (PTC) is one of the most common, well-differentiated carcinomas of the thyroid gland. PTC nodules are often surrounded by a collagen capsule that prevents the spread of cancer cells. However, as the malignant tumor progresses, the integrity of this protective barrier is compromised, and cancer cells invade the surroundings. The detection of capsular invasion is, therefore, crucial for the diagnosis and the choice of treatment and the development of new approaches aimed at the increase of diagnostic performance are of great importance. In the present study, we exploited the wide-field second harmonic generation (SHG) microscopy in combination with texture analysis and unsupervised machine learning (ML) to explore the possibility of quantitative characterization of collagen structure in the capsule and designation of different capsule areas as either intact, disrupted by invasion, or apt to invasion. Two-step k-means clustering showed that the collagen capsules in all analyzed tissue sections were highly heterogeneous and exhibited distinct segments described by characteristic ML parameter sets. The latter allowed a structural interpretation of the collagen fibers at the sites of overt invasion as fragmented and curled fibers with rarely formed distributed networks. Clustering analysis also distinguished areas in the PTC capsule that were not categorized as invasion sites by the initial histopathological analysis but could be recognized as prospective micro-invasions after additional inspection. The characteristic features of suspicious and invasive sites identified by the proposed unsupervised ML approach can become a reliable complement to existing methods for diagnosing encapsulated PTC, increase the reliability of diagnosis, simplify decision making, and prevent human-related diagnostic errors. In addition, the proposed automated ML-based selection of collagen capsule images and exclusion of non-informative regions can greatly accelerate and simplify the development of reliable methods for fully automated ML diagnosis that can be integrated into clinical practice.