ABSTRACTAgricultural Internet of Things has become one of the most important data sources of agricultural big data. However, the wireless sensors equipped with agricultural Internet of Things is affected by many factors, and anomalous data inevitably exists during the data collection process. The existence of anomalous data leads to the deterioration of the agricultural data quality, which hinders the efficient development of agricultural data analysis. In order to detect anomalous data accurately for agricultural wireless sensors, in this article, we propose an anomaly detection model that combines multimodal fusion and error reconstruction. The model first adds Gaussian noise to the input sequence and converts it into standard time series and image data. Subsequently, it is processed by different modal encoders and fuses the image and sequence modal data using Temporal Cross‐modal Attention module to enhance the perception of anomalous modal information. Finally, the fused data is reconstructed, and anomalous data are identified by the image and sequence decoders. Comparison experiments with several baselines prove the validity of the model proposed in this article, ablation experiments prove the necessity of the key modules in the model, and multiple sets of experiments are also designed to discuss the effects of hyperparameters.