Growth monitoring of crops is a crucial aspect of precision agriculture, essential for optimal yield prediction and resource allocation. Traditional crop growth monitoring methods are labor-intensive and prone to errors. This study introduces an automated segmentation pipeline utilizing multi-date aerial images and ortho-mosaics to monitor the growth of cauliflower crops (Brassica Oleracea var. Botrytis) using an object-based image analysis approach. The methodology employs YOLOv8, a Grounding Detection Transformer with Improved Denoising Anchor Boxes (DINO), and the Segment Anything Model (SAM) for automatic annotation and segmentation. The YOLOv8 model was trained using aerial image datasets, which then facilitated the training of the Grounded Segment Anything Model framework. This approach generated automatic annotations and segmentation masks, classifying crop rows for temporal monitoring and growth estimation. The study’s findings utilized a multi-modal monitoring approach to highlight the efficiency of this automated system in providing accurate crop growth analysis, promoting informed decision-making in crop management and sustainable agricultural practices. The results indicate consistent and comparable growth patterns between aerial images and ortho-mosaics, with significant periods of rapid expansion and minor fluctuations over time. The results also indicated a correlation between the time and method of observation which paves a future possibility of integration of such techniques aimed at increasing the accuracy in crop growth monitoring based on automatically derived temporal crop row segmentation masks.