Abstract

Abstract Background YouTube is a social media platform associated with large viewership but little research into its role in the propagation of online health misinformation. This study proposes an automated pipeline to facilitate the collection and analysis of health misinformation on YouTube. Methods The pipeline relies on Python and the Youtube Data API. A preliminary test of the proposed pipeline was conducted using two videos from the channel “@BobbyParrish” (5.55M subscribers, 1.5K videos). The pipeline was used to extract two videos with large view counts and comparable like and comment counts. This extraction includes the transcript of the respective videos and engagement metrics. All the comment threads under the videos are also collected with the reply structure preserved. Then, the pipeline employs NLTK’s SentimentIntensityAnalyzer to score each comment for sentiment polarity and classify into positive, negative, or neutral. The pipeline generates visualizations of the sentiment distribution and a frequency-based word cloud of emojis extracted from the text. Results The proposed pipeline passed the test satisfactorily. It was able to retrieve channel statistics and metrics associated with the videos on the channel. It also successfully extracted the transcript and complete comments of the videos while preserving the integrity of the reply structure found on YouTube. Automated analyses of the data resulted in comprehensive and accessible visualizations. Conclusions The proposed work has the potential to facilitate large-scale studies into the propagation of health misinformation on YouTube. Moreover, it can be used by public health officials to rapidly address viral videos spreading health misinformation through social inoculation. Future work includes integrating topic modelling and automatic classification of health misinformation in the analysis portion of the pipeline. Key messages • The pipeline offers public health officials and researchers a rapid tool to identify and analyze health misinformation on YouTube, facilitating timely interventions. • The pipeline enhances the ability to monitor and respond to evolving public health misinformation trends, by automating the extraction and sentiment analysis of YouTube data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.