Bird diversity plays an important role in ecological balance, and bird song identification is of great practical significance. The spectrum generated by feature extraction shows good performance on classification. However, the information extracted by the filter in the process of spectrogram generation can cause information loss, which limits the learning ability of birdsong recognition. This study proposes a feature fusion network (MFF-ScSEnet) to solve this problem. The audios of the birdsong extracted the Mel-spectrogram with low-frequency feature advantage by the Mel-filter, and the Sinc-spectrogram with timbral feature advantage by the Sincnet-filter, respectively, and perform the early fusion strategy. The ScSEnet attention module is introduced into the backbone network ResNet18 to enhance the sound ripple information of the spectrogram, reduce the influence of spectrogram noise information on the recognition and improve the recognition performance of the network. Based on the feature fusion network MFF-ScSEnet in this paper, the accuracy of the experimental results on the self-built birdsong dataset (Huabei_dataset), the public datasets of Urbansound8K and Birdsdata reached 96.28%, 98.34%, and 96.66%, respectively. The results indicated that the method proposed in this paper is superior to the recent and latest birdsong recognition method.