Abstract
YouTube is becoming an increasingly popular entertainment platform, with videos catering to a wide range of interests. If L2 users are to become proficient in the primary form of language, conversation, then the affordances created by YouTube videos containing informal speech could be very useful. In the current study a near-random corpus of 2602 YouTube video transcripts was compiled and 200 randomly selected texts from the Spoken BNC2014 (Love et al., 2017) were used as a reference corpus representing informal spoken English. The texts were tagged with 67 linguistic features as part of an additive multi-dimensional analysis. The dimension scores for each text were used in a cluster analysis to investigate which texts clustered with the Spoken BNC2014 texts. A two-cluster solution was chosen with 666 YouTube texts and 171 Spoken BNC2014 texts in one cluster, and the remaining texts in the other cluster. A small sample of texts from each cluster was analysed in detail. It is shown that this method has the potential to identify videos featuring informal speech and that some videos with similar categories have a very different linguistic style.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.