Small object change detection (SOCD) based on high-spatial resolution (HSR) images is of significant practical value in applications such as the investigation of illegal urban construction, but little research is currently available. This study proposes an SOCD model called TMACNet based on a multitask network architecture. The model modifies the YOLOv8 network into a Siamese network and adds structures, including a feature difference branch (FDB), temporal mutual attention layer (TMAL) and contextual attention module (CAM), to merge differential and contextual features from different phases for the accurate extraction and analysis of small objects and their changes. To verify the proposed method, an SOCD dataset called YZDS is created based on unmanned aerial vehicle (UAV) images of small-scale solar water heaters on rooftops. The experimental results show that TMACNet exhibits strong resistance to image registration errors and building height displacement and prevents error propagation from object detection to change detection originating from overlay-based change detection. TMACNet also provides an enhanced approach to small object detection from the perspective of multitemporal information fusion. In the change detection task, TMACNet exhibits notable F1 improvements exceeding 5.96% in comparison with alternative change detection methods. In the object detection task, TMACNet outperforms the single-temporal object detection models, increasing accuracy with an approximately 1–3% improvement in the AP metric while simplifying the technical process.