Abstract Wordle is a popular puzzle game where players share their daily scores on Twitter. Based on this data, we developed a W-system to analyze and predict Wordle game outcomes, providing insights for further game development. The number of reported results varies daily. To explain this variation, we first cleaned the data reported on Twitter and performed autocorrelation and ADF tests, revealing that the series was not stationary. After performing first-order differencing, we confirmed stationarity. Using the AIC criterion, we determined that the ARIMA (1, 1, 0) model was optimal. We performed residual and LB tests, and then predicted the number of reports for the next 60 days, identifying a decreasing trend. Additionally, we investigated whether any attributes of the words affected the percentage of scores reported in Hard Mode using a random forest model. We found that certain vowels, such as A and E, made words easier, while I and O made words harder. To predict the distribution of reported results for a given future solution word, we designed a 24-5-7 fully connected neural network. The input layer consists of a 24-dimensional vector representing letter occurrences, the hidden layer has five neurons, and the output layer is a 7-dimensional vector representing the distribution of scores. We trained the model using MSE error and the Adam optimizer, achieving a final test set loss of 7.22, indicating good generalization ability. For example, for the word EERIE on March 1, 2023, the model predicted distribution with reasonable confidence, accounting for associated uncertainties. We also developed a model to classify solution words by difficulty. We identified eight attributes of words and used the Analytic Hierarchy Process (AHP) to assign weights to each feature. Applying the k-means clustering algorithm, we classified 355 words into three categories: easy, medium, and hard. Using our model, the word EERIE was classified as medium difficulty. We validated our model by plotting a two-dimensional clustering scatter plot using the two features with the highest weights, showing good separation among the three categories, and indicating the accuracy of our classification model.
Read full abstract