Developing a high-precision wheat head detection algorithm is challenging due to the dense distribution and diverse sizes of wheat head in the field, as well as serious coverage from weeds. In this work, we propose multi-scale feature enhancement networks for wheat head detection and counting in complex scene (MFNet). Firstly, we introduce a deformable spatial attention mechanism (DSAM) and embed it in the backbone network to enhance the extraction of wheat head features while suppressing irrelevant features, effectively improving detection of wheat head in occluded environments. Secondly, we design a multi-scale receptive field feature fusion (MRFF) module in combination with an improved light-weight feature pyramid module to achieve more accurate detection of wheat head of different sizes and improve localization accuracy simultaneously. Additionally, the modified detection head with deformable convolution is able to adapt to different shapes of wheat head features and accurately predict the bounding boxes. Our method achieves 94.2% AP@50 at a speed of 30 FPS on the GWHD dataset, and to verify the generalization of the proposed method, we constructed a dense wheat head detection (DWHD) dataset with annotations, conducted experiments on the DWHD and SPIKE datasets and compared them with state-of-the-art algorithms. The experimental results show that the proposed method outperformed most of the existing methods, which further proved the superiority and robustness of the proposed method, and demonstrated that our model possesses excellent adaptability, enabling it to flexibly cope with scenarios of significant scale differences and severe occlusion in complex field environments for wheat head, and provide a technological reference for monitoring the wheat phenotype.