The rapid development of intelligent transportation systems requires accurate prediction of short-term traffic flow. However, the nonlinearity, uncertainty, and spatiotemporal variability of traffic flow make it difficult for traditional traffic flow prediction methods to achieve ideal results. This study uses machine learning methods to improve the accuracy of prediction and expand features by evaluating the clustering effect of the k-means algorithm through data dimensionality reduction, interpolation of missing data, and the silhouette coefficient method. The 5-fold cross-validation and grid search methods were used to optimize the hyperparameters, and the Lasso regression model, ridge regression model and random forest regression model were compared. It was found that the feature expansion of the random forest had the highest fitting coefficient and the smallest error. This improves prediction accuracy, provides a valuable reference for intelligent traffic flow prediction, and has the potential for further optimization in feature selection, model design, and traffic control strategies.
Read full abstract