Abstract

Analysing data on a large scale is becoming important and engages in convincing many researchers to use new platforms and tools that can handle large amounts of data. In this article, we present new evaluation sentiment analysis for large-scale datasets of COVID-19 Vaccine Stance tweets and COVID-19 Tweets IEEE data port datasets in the Apache Spark data system. The Apache Spark Scalable Machine Learning Library (ML) is used. We designed hybrid minhash models from the library with four classification methods: Logistic Regression (LR), Naive Bayes, Support Vector Machine and Random Forest classifiers in a parallel and distributed manner. In addition, Minhash with locality Sensitive hashing (Minhash-LSH) is compared to Minhash-ML. Performance parameters such as user, system and real time, time consumed, and accuracy have been applied in the comparative analysis to analyse the behaviour of the classifiers in the AWS spark Cluster, Local Spark cluster and in conventional system. Results have indicated that the models in spark environment was extremely effective for processing large-dimension data, which cannot be processed with conventional implementation or take much time related to some algorithms. The proposed model achieves accuracy above 99% in case of Vaccine tweet dataset when classified with Minhash- RF and Minhash- LR classifiers. Also, 100% in case of COVID-19 Tweets Provided by IEEE data port when using Minhash-SVM, Minhash-RF and Minhash-LR classifiers.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.