Abstract

In the film industry, the largest producer of films in the world is India. The Indian film industry was established in 1913 and is the second oldest in the world. India was the third largest with box office revenue of US $ 2.18 billion in 2017. The Indian film industry is multi-lingual. Hindi film Industry is the largest film industry in India and is mostly based in Mumbai (Bombay), which is referred to as "Bollywood". This paper attempts to predict whether an upcoming movie would be a blockbuster, neutral or a flop. By predicting this, it can help production houses in advertising and to find the best time period to release a movie by looking at the overall environment. This paper proposes making use of classification technique of Data Mining i.e. Naïve Bayes Theorem and Decision Tree on Data Mining Tool named Orange. Data mining is a process to transform raw data into useful information. By applying data mining, we can discover a large set of patterned data. Machine Learning Statistics and Database Systems are involved in Data Mining. In Knowledge Discovery (KDD) process, data mining is an analysis step. Classification helps us to classify the data according to the attributes of the data with respect to a predefined set of classes. Naive Bayes is a theorem of data mining which is able to predict categorical class labels (blockbuster, neutral, and flop) that classifies data according to rating, month, year of release, genres such as drama, action, romance, comedy, mystery, thriller, and other attributes, and values to classify an upcoming movie. Decision tree helps in supervised learning by creating a training model which can help us predict class values by learning decisions from prior data. To perform the research, we used a Data Mining Tool named Orange which is an open source component-based Visual Programming Software Package for data visualization, machine learning, data mining, and data analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call