The integration of large-scale wind power into the power grid threatens the stable operation of the power system. Traditional wind power prediction is based on time series without considering the variability between wind turbines in different locations. This paper proposes a wind power probability density prediction method based on a time-variant deep feed-forward neural network (ForecastNet) considering a spatio-temporal distribution. First, the outliers in the wind turbine data are detected based on the isolated forest algorithm and repaired through Lagrange interpolation. Then, based on the graph attention mechanism, the features of the proximity node information of the individual wind turbines in the wind farm are extracted and the input feature matrix is constructed. Finally, the wind power probability density prediction results are obtained using the ForecastNet model based on three different hidden layer variants. The experimental results show that the ForecastNet model with a hidden layer as a dense network based on the attention mechanism (ADFN) predicts better. The average width of the prediction intervals at achieved confidence levels for all interval coverage is reduced by 34.19%, 35.41%, and 35.17%, respectively, when compared to the model with the hidden layer as a multilayer perceptron. For different categories of wind turbines, ADFN also achieves relatively narrow interval average widths of 368.37 kW, 315.87 kW, and 299.13 kW, respectively.