Predicting emerging research topics is important to researchers and policymakers. In this study, we propose a two-step solution to the problem of emerging topic prediction. The first step forecasts the future popularity score, a novel indicator reflecting the impact and growth, of candidate topics in a time-series manner. The second step selects novel topics from the candidates predicted to be popular in the first step. Terms with domain characteristics are used as candidate topics. Deep neural networks, specifically LSTM and NNAR, are applied with nine features of topics to predict popularity score. We evaluated the models and five baselines on two datasets from two perspectives, i.e., the ability to (1) predict the correct indicator value and (2) reconstruct the optimal ranking order. Two types of training strategies were compared, including a global strategy that trains a model with all topics and two local strategies that train separate models with different groups of topics. Our results show that LSTM and NNAR outperform other models in predicting the value of popularity score measured by MAE and RMSE, while LightGBM is a competitive baseline in ranking the topics in terms of NDCG@20. The performance difference of global and local strategies is not significant. Emerging topics predicted by our approach are compared with those by other methods. A qualitative assessment on nominated emerging topics suggests topics nominated by machine learning methods are more alike than those by the rule-based model. Some important topics are nominated according to a preliminary literature analysis. This study exploited the strengths of both machine learning and bibliometric indicator approaches for emerging topic prediction. Deep neural networks are applied where objective optimization target can be defined and measured. Bibliometric indicator offers an efficient way to select novel topics from candidates. The hybrid approach shows promise in considering various characteristics of emerging topics when making predictions.
Read full abstract