Wind speed prediction methods used to schedule wind power generation in advance is of great significance for ensuring grid safety and improving wind energy availability. However, most existing wind speed prediction models insufficiently extract spatial features for predicting wind speed at wind turbines stations, leading to less satisfying prediction results. To solve this issue, an adaptive spatiotemporal features fusion Transformer is proposed based on graph attention network (GAT) and optimizable graph matrix. First, a novel parameter optimization matrix is constructed using geographic information of cluster wind turbines, dynamic time warping (DTW), and maximum information coefficient (MIC) information to express the wind speed spatial correlation among these turbines. Second, graph attention network is used to extract spatial features from this matrix, sufficiently evaluating spatial similarity of wind speed series at different stations. Third, an embedding block is performed to characterize temporal features of cluster wind speed. Fourth, to effectively integrate these spatial and temporal features into spatial-temporal features, a new type of entangle block based on parameter optimization is proposed. Fifth, predicting values are obtained based on an improved Transformer by extracting effective features from spatiotemporal features with multi-head attention mechanism. Finally, Huber loss function is used to iteratively optimize the network parameters of the proposed model. To verify the effectiveness of the proposed model, five performance indexes, including MAE, MSE, MAPE, TIC and IA are employed. The results show that the proposed model outperforms other models including GAT-GRU, GAT-LSTM, GAT-Transformer, GCN-GRU, GCN-LSTM, GCN-Transformer, and Autoformer.