Abstract The accuracy of wind turbine gearbox fault diagnosis will be compromised if the fault feature data is not adequately extracted during operation. To enhance fault identification efficiency and mitigate human interference in parameter setting, this paper introduces an optimized mode decomposition algorithm OCSSA-VMD, derived from variational mode decomposition (VMD) and further optimized by osprey-Cauchy-sparrow search algorithm (OCSSA). This algorithm offers two key advantages: (1) automatic optimization of parameters such as the number of modes k and penalty factor α; (2) reduction of feature dimensionality through mean impact value (MIV) algorithm based on minimum envelope entropy principle, resulting in a multi-fault feature vector set from 13 time-domain features in the intrinsic mode function (IMF) optimal component of wind turbine gearbox vibration data. Additionally, a fault diagnosis model WOA-CNN-BiLSTM is proposed based on whale optimization algorithm (WOA) and convolutional neural network-bidirectional long-short-term-memory (CNN-BiLSTM), which demonstrates improved fault classification accuracy to 98.3333% and diagnosis accuracy to 98.3853% under conditions of insufficient data when compared with other models.