Current remote sensing technologies employing Unmanned Aerial Vehicles (UAVs) for farm monitoring have shown promise in characterizing the environment through diverse sensor systems, including hyperspectral cameras, LiDAR, thermal cameras, and RGB sensors. However, these solutions often specialize in either activity recognition or crop monitoring, but not both. To address this limitation and enhance efficacy, we propose a multi-vision monitoring (MVM) framework capable of simultaneously recognizing farm activities and assessing crop health. Our approach involves computer vision techniques that transform aerial videos into sequential images to extract essential environmental features. Central to our framework are two pivotal components: the Farmer Activity Recognition (FAR) algorithm and the Crop Image Analysis (CIA). The FAR algorithm introduces a novel feature extraction method capturing motion across various maps, enabling distinct feature sets for each activity. Meanwhile, the CIA component utilizes the normalized Triangular Greenness Index (nTGI) to estimate leave chlorophyll levels, an important indicator for crop health. By unifying these components, we achieve dual functionality—activity recognition and crop health estimation—using identical input data, thereby enhancing efficiency and versatility in farm monitoring. Our framework employs a diverse range of machine learning models, demonstrating the potential of our extracted features to address the defined problem effectively in unison.