Distributed Supervised Sentiment Analysis of Tweets: Integrating Machine Learning and Streaming Analytics for Big Data Challenges in Communication and Audience Research

Carlos Arcila Calderón,Félix Ortega Mohedano,Mateo Álvarez,Miguel Vicente Mariño

doi:10.5944/empiria.42.2019.23254

Carlos Arcila Calderón, Félix Ortega Mohedano + Show 2 more

Open Access

https://doi.org/10.5944/empiria.42.2019.23254

Copy DOI

Journal: Empiria. Revista de metodología de ciencias sociales	Publication Date: Jan 15, 2019
Citations: 4	License type: CC BY-NC-SA 4.0

Abstract

The large-scale analysis of tweets in real-time using supervised sentiment analysis depicts a unique opportunity for communication and audience research. Bringing together machine learning and streaming analytics approaches in a distributed environment might help scholars to obtain valuable data from Twitter in order to immediately classify messages depending on the context with no restrictions of time or storage, empowering cross-sectional, longitudinal and experimental designs with new inputs. Even when communication and audience researchers begin to use computational methods, most of them remain unfamiliar with distributed technologies to face big data challenges. This paper describes the implementation of parallelized machine learning methods in Apache Spark to predict sentiments in real-time tweets and explains how this process can be scaled up using academic or commercial distributed computing when personal computers do not support computations and storage. We discuss the limitation of these methods and their implications in communication, audience and media studies.El análisis a gran escala de tweets en tiempo real utilizando el análisis de sentimiento supervisado representa una oportunidad única para la investigación de comunicación y audiencias. El poner juntos los enfoques de aprendizaje automático y de analítica en tiempo real en un entorno distribuido puede ayudar a los investigadores a obtener datos valiosos de Twitter con el fin de clasificar de forma inmediata mensajes en función de su contexto, sin restricciones de tiempo o almacenamiento, mejorando los diseños transversales, longitudinales y experimentales con nuevas fuentes de datos. A pesar de que los investigadores de comunicación y audiencias ya han comenzado a utilizar los métodos computacionales en sus rutinas, la mayoría desconocen el uso de las tecnologías de computo distribuido para afrontar retos de dimensión big data. Este artículo describe la implementación de métodos de aprendizaje automático paralelizados en Apache Spark para predecir sentimientos de tweets en tiempo real y explica cómo este proceso puede ser escalado usando computación distribuida tanto comercial como académica, cuando los ordenadores personales son insuficientes para almacenar y analizar los datos. Se discuten las limitaciones de estos métodos y sus implicaciones en los estudios de medios, comunicación y audiencias.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributed Supervised Sentiment Analysis of Tweets: Integrating Machine Learning and Streaming Analytics for Big Data Challenges in Communication and Audience Research

Abstract

Talk to us

Similar Papers

More From: Empiria. Revista de metodología de ciencias sociales

Lead the way for us

Similar Papers

Nanofluid heat transfer and machine learning: Insightful review of machine learning for nanofluid heat transfer enhancement in porous media and heat exchangers as sustainable and renewable energy solutions
Tri W.B Riyadi ... Ibham Veza
Results in Engineering | VOL. 24
Tri W.B Riyadi, et. al.Tri W.B Riyadi ... Ibham Veza
26 Sep 2024
Results in Engineering | VOL. 24

Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review.
Meng Zhang ... Feng Sun
Health data science | VOL. 4
Meng Zhang, et. al.Meng Zhang ... Feng Sun
01 Jan 2024
Health data science | VOL. 4

Development of an expert system for the classification of myalgic encephalomyelitis/chronic fatigue syndrome.
Fatma Hilal Yagin ... Burak Yagin
PeerJ Computer Science | VOL. 10
Fatma Hilal Yagin, et. al.Fatma Hilal Yagin ... Burak Yagin
20 Mar 2024
PeerJ Computer Science | VOL. 10

Integrating machine learning and blockchain: Conceptual frameworks for real-time fraud detection and prevention
Halima Oluwabunmi Bello ... Courage Idemudia
World Journal of Advanced Research and Reviews | VOL. 23
Halima Oluwabunmi Bello, et. al. Halima Oluwabunmi Bello ... Courage Idemudia
30 Jul 2024
World Journal of Advanced Research and Reviews | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed Supervised Sentiment Analysis of Tweets: Integrating Machine Learning and Streaming Analytics for Big Data Challenges in Communication and Audience Research

Abstract

Talk to us

Similar Papers

More From: Empiria. Revista de metodología de ciencias sociales