Images taken by drones often must be preprocessed and stitched together due to the inherent noise, narrow imaging breadth, flying height, and angle of view. Conventional UAV feature-based image stitching techniques significantly rely on the quality of feature identification, made possible by image pixels, which frequently fail to stitch together images with few features or low resolution. Furthermore, later approaches were developed to eliminate the issues with conventional methods by using the deep learning-based stitching technique to collect the general attributes of remote sensing images before they were stitched. However, since the images have empty backgrounds classified as stitched points, it is challenging to distinguish livestock in a grazing area. Consequently, less information can be inferred from the surveillance data. This study provides a four-stage object-based image stitching technique that, before stitching, removes the background’s space and classifies images in the grazing field. In the first stage, the drone-based image sequence of the livestock on the grazing field is preprocessed. In the second stage, the images of the cattle on the grazing field are classified to eliminate the empty spaces or backgrounds. The third stage uses the improved SIFT to detect the feature points of the classified images to o8btain the feature point descriptor. Lastly, the stitching area is computed using the image projection transformation.