Abstract

Over the last few decades, social media platforms have gained a lot of popularity. People of all ages, gender, and areas of life have their presence on at least one of the social platforms. The data that is generated on these platforms has been and is being used for better recommendations, marketing activities, forecasting, and predictions. Considering predictions, the movie industry worldwide produces a large number of movies per year. The success of these movies depends on various factors like budget, director, actor, etc. However, it has become a trend to predict the rating of the movie based on the data collected from social media related to the movie. This will help a number of businesses relying on the movie industry in making promotional and marketing decisions. In this report, the aim is to collect movie data from IMDB and its social media data from YouTube and Wikipedia and compare the performance of two machine learning algorithms – Random Forest and XGBoost – best known for their high accuracy with small datasets, but large feature set. The collection of data is done from multiple sources or APIs.

Highlights

  • Living in a socially and digitally connected world, everyone leaves a digital trace of themselves in different forms on the web

  • The movie success prediction has a vast range of attributes that gives a holistic approach to perform predictions, movies are something that creates lots of buzz in digital space, and based on the stardom of celebrities there are lots of hailing and criticism bubbling on social media platforms

  • The objective is to predict the ratings of the movies using two ensemble learning algorithms - Random Forest and XGBoost, that can be used to evaluate the success or failure of a movie before its release and to compare them on their performance

Read more

Summary

INTRODUCTION

Living in a socially and digitally connected world, everyone leaves a digital trace of themselves in different forms on the web. Many social media platforms by default work on an algorithm where they rank the comments based on popularity and relevance rather than collecting all of the comments of each video, only the comments which hold greater relevance and enough values to cover almost every point of view were collected. This strategy allowed to trace the digital footprints of buzz and move closer to predict the future of the product. There are many hyperparameters that needs to be set for efficient use of this algorithm, it includes – n_estimaters, max_depth, eta or learning rate, reg_alpha & reg_lamba (regularization terms)

Problem Statement
LITERATURE REVIEW
DATA COLLECTION PROCEDURE
MODEL IMPLEMENTATION USING SCIKIT’S LEARN LIBRARY
TESTING AND EVALUATION RESULTS
Findings
CONCLUSION AND FUTURE SCOPE

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.