In precision agriculture, effective monitoring of corn pest regions is crucial to developing early scientific prevention strategies and reducing yield losses. However, complex backgrounds and small objects in real farmland bring challenges to accurate detection. In this paper, we propose an improved model based on YOLOv4 that uses contextual information and attention mechanism. Firstly, a context priming module with simple architecture is designed, where effective features of different layers are fused as additional context features to augment pest region feature representation. Secondly, we propose a multi-scale mixed attention mechanism (MSMAM) with more focus on pest regions and reduction of noise interference. Finally, the mixed attention feature-fusion module (MAFF) with MSMAM as the kernel is applied to selectively fuse effective information from additional features of different scales and alleviate the inconsistencies in their fusion. Experimental results show that the improved model performs better in different growth cycles and backgrounds of corn, such as corn in vegetative 12th, the vegetative tasseling stage, and the overall dataset. Compared with the baseline model (YOLOv4), our model achieves better average precision (AP) by 6.23%, 6.08%, and 7.2%, respectively. In addition, several comparative experiments were conducted on datasets with different corn growth cycles and backgrounds, and the results verified the effectiveness and usability of the proposed method for such tasks, providing technical reference and theoretical research for the automatic identification and control of pests.