Abstract

This paper presents an analysis of the differences between written text and the transcription of spoken text using current Natural Language Processing (NLP) methods. The purpose of the study is to investigate the long and rich history of attempts to differentiate spoken and written text in fields such as linguistics, communication, and rhetoric, which date back to the early 20th century. Given the availability of large quantities of machine-readable data and machine learning algorithms that can handle them, it is possible to use a large number of derived features. The research focuses on syntactic and lexical differences in written books and transcriptions of speeches by United States presidents. The analysis investigates morphological, lexical, syntactical, and text-level aspects. In this process, multiple features have been considered including lexical diversity, syllable count, frequency of parts of speech, and features relating to the parse tree, like the average length of noun phrases, and the use of interrogative sentences, among others. This study will enhance our understanding of the difference between written text and the transcription of spoken text in various disciplines including computer science, applied linguistics, communication, and similar fields.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.