Abstract

Many YouTube videos provide written audio transcripts which provide information on the language used on YouTube. One important measure relating to language usage is word frequency. Using student-developed software and libraries in R, Python, and Microsoft Excel, the transcripts of one million YouTube videos from the YouTube-8M data set were scraped and analyzed. The word frequency of the YouTube data set was shown to correlate with commonly used word frequency measures from established studies, such as the subtitle word frequency and the HAL word frequency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call