Abstract

Natural language processing (NLP) is an area of research that is used to investigate the application of natural language and is the foundation of machine translation, natural language text processing, natural language generation, multilingual and cross language information retrieval, speech recognition, parsing, and expert systems. To understand natural language in order to build or select appropriate algorithms for processing, three major issues are called into attention: humans' thought processes, the meaning of linguistic input in context, and world knowledge. These considerations have led to the development of various types of NLP tools for lexical and morphological analysis, semantic and discourse analysis, as well as knowledge-based approaches (c.f., Chowdhury, 2003). After decades of evolution and advancement, the current stage of NLP, as Xi (2010) pointed out, has allowed language testing researchers to apply its techniques in developing automated scoring systems for the purpose of language learning and assessment. The algorithms of NLP provide automated scoring systems a solid theoretical ground. Automated scoring systems have been adopted mainly for two kinds of language assessments: writing (i.e., essay scoring) and speaking (i.e., speech scoring). Automated essay scoring systems are generally designed to identify examinees' written production features in terms of fluency (the number of words in the essay), diction (the variation in word length), and syntactic complexity (the number of various parts of speech). Several expert essay-scoring systems have been published, such as PEG (Project Essay Grade), IEA (Intelligent Essay Assessor), BETSY (Bayesian Essay Test Scoring sYstem), and IntelliMetric. Among all, the most well-known system is perhaps e-rater, developed by Educational Testing Service (ETS). The original intent of e-rater was for it to serve as a second rater in the Analytic Writing Assessment (AWA) in GMAT; currently, e-rater is also used as a second rater in the analytic writing section in GRE, the independent writing task in TOEFL iBT, and as the sole rater for TOEFL online practice tests. Interestingly, there have not been as many expert systems for automated speech scoring compared to those for automated essay scoring. To date, only two have been more commonly applied in language assessment: Versant Tests by Ordinate Corporation and SpeechRater, by Educational Testing Service (ETS). Versant Tests aim to assess examinees' everyday listening and speaking ability by computing their test scores in listening vocabulary, repeat accuracy, reciting and pronunciation, reading fluency, and repeat fluency. The scoring system also takes suprasegmental features (e.g., timing, pause, rhythm, etc) into account. According to Ordinate Corporation's internal research, Versant Tests allow for high level of test administration efficiency, and their test results yield high reliability as well as high predictability of examinees' performance in real life (Townshend & Todic, 1999). However, other researchers (c.f., Bernstein, 1999; Xi, Higgins, Zechner, & Williamson, 2008) have pointed out that the task types adopted in Versant Tests limit the representativeness of communicative competence since a lot of the higher-order cognitive abilities or complex linguistic knowledge are not present. SpeechRater, on the other hand, was developed by ETS specifically for scoring the speaking

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call