Wind power forecasting is critical for optimizing energy use and ensuring the reliability of wind power systems in renewable energy. This paper introduces a novel method that combines the Grey Wolf Optimization (GWO) algorithm with data compression techniques to enhance feature selection and reduce redundancy in wind speed prediction. By employing GWO, essential features were identified by grouping the dataset into intervals and analyzing their frequencies. Performance evaluation was conducted using various compression measures, including Rate DC-Miss, Rate DC-MEF, and Rate DC-BDG, compared with other models such as extreme gradient boosting, space-time graph neural networks, and deep learning models. The study's results show significant improvements in accuracy and efficiency for predicting wind speed compared to existing techniques. The proposed approach addresses both larger datasets and the impact of noise samples on prediction errors. Additionally, an MLDDR model was introduced to predict DC power generated from wind datasets, encompassing five stages: Data Preparation, Feature Selection, Data Compression, GRU-Based Predictions, and Rate of Reduction. Data reduction results are notable. The original wind dataset (104857613) was reduced to 1093913 after processing missing values, achieving a reduction rate of 0.136. Applying the MEF-GWO algorithm further reduced the dataset to 109395, with a reduction rate of 0.385. The BDG dataset was compressed to 1805, with a reduction rate of 0.607. In terms of prediction performance, the GRU model was evaluated on three datasets: the original, MEF, and BDG-GWO datasets. The GRU model demonstrated the highest accuracy (99.20 %) with the BDG-GWO dataset, with precision (0.9965), recall (0.9978), and F1 scores (0.9897) indicating superior performance. Training and testing times varied significantly, highlighting the computational challenges associated with deep learning techniques. This research addresses both programming and application challenges. Programming challenges include high computational demands and the trial-and-error nature of parameter determination in deep learning, mitigated by using GWO. Application challenges involve reducing large datasets, grouping them into intervals, and evaluating performance using different compression measures. The main research questions addressed include the suitability of the GWO algorithm for dataset reduction in terms of dimensions (features and records) and the effectiveness of combining GWO with deep learning, specifically GRU, for enhanced prediction results. The study concludes that the GWO-PCA and GRU combination significantly improves prediction accuracy and reduces implementation time.