A forecast of such accuracy would have significant implications in this respect for content creators, marketers, and researchers alike. The paper questions whether machine learning models can predict the views that any certain TED Talks receive with a dataset containing features on the duration of the talks, characteristics of the speaker, the date of publication, metrics indicating audience engagement, and so on. In this work, we employ techniques of multiple regression, decision trees, and ensemble methods to identify empirically which factors have the most influence and to come up with the best models for the prediction. Indeed, the results obtained easily indicate that the ensemble models-especially Random Forest and Gradient Boosting- perform way above the rest. These features help to emphasize the fact that machine learning will help guide a content strategy to better reach audiences and give insight into a deeper understanding of how digital content is consumed. Keywords: TED Talks, machine learning, predictive modeling, view prediction, Random Forest, Gradient Boosting, digital content analysis, audience engagement.
Read full abstract