Automated Vocabulary Profiling of TOEIC Listening Materials: A CEFR-Aligned Approach for EFL Learners
This study examines the vocabulary characteristics of TOEIC Listening materials to support the development of more targeted English language teaching resources for EFL learners, particularly in Thai higher education. Using a corpus-based approach, we collected and analyzed a representative dataset of TOEIC preparation texts with a custom-built Python tool for vocabulary profiling. The tool performed key tasks such as frequency analysis, concordance generation, n-gram extraction, collocation detection, and CEFR-level classification. The vocabulary items were categorized using established lists, including the General Service List (GSL), Academic Word List (AWL), and CEFR levels. Results reveal that basic (K1) and function words dominate the materials, while a substantial proportion of off-list and domain-specific vocabulary was also identified. Most words fall within the B1 proficiency level, suggesting intermediate-level accessibility. The study contributes a novel, automated vocabulary profiling framework that integrates linguistic metrics and CEFR-based classification, offering practical implications for curriculum design, test preparation, and vocabulary instruction. This approach enhances the precision and efficiency of material evaluation, bridging the gap between test content and learner needs. The findings highlight the potential of automated tools to improve vocabulary-focused teaching strategies and inform language assessment practices in EFL contexts.
- Research Article
2
- 10.30479/jmrels.2019.10266.1275
- May 1, 2018
University students are mainly advised to master the words in West’s General Service List (GSL) and Coxhead’s Academic Word List (AWL) in order to be able to read their academic texts easily and effectively. However, there are too many words in the two lists and a large number of them seem to be of less frequency in many academic disciplines; moreover, there are many important general and academic words which are missing in the two lists. The present study explored a corpus of psychology texts containing 3.4 million running words to work out the most frequent words used in psychology, a less investigated discipline. The corpus was analyzed by some text analysis software (TextStat and TextAnalys) and a list of 1587 most frequent word families was developed for psychology. The list included general English and academic words and no technical words of psychology. The frequency of GSL and AWL word families was investigated in the corpus to find out the GSL and AWL words highly frequent in psychology texts and also other high frequency words of psychology which are absent in the two lists. The results revealed that 1077 GSL and 95 AWL word families were of low frequency in psychology texts and there were 189 high frequency general and academic words which are absent in the GSL and AWL. The coverage of the developed psychology word list over the corpus was shown to be 2.2% higher than that of GSL plus AWL, although it contained 983 fewer words.
- Research Article
- 10.61508/refl.v25i2.165396
- Dec 31, 2018
- rEFLections
This study is a corpus-driven study that aims to explore the use of words in Coxhead’s Academic Word List (AWL) and West’s General Service List (GSL) and also non- GSL and non- AWL in journal articles in the field of physical education and sport science. A 1.1 million-word corpus called the Physical Education and Sport Science Research Articles Corpus is created for this study. The corpus consists of 280 research articles that have been published in seven international journals in the field of physical education and sport science. The result suggests that both GSL and AWL can help students focus on the right vocabulary when learningTechnical English. The corpus helps students to directly focus on the words that they will see the most in the text they have to study. Moreover field specific word list is conducted in this research. Field specific word lists can help students learn necessary words which are also important for their field of study.
- Research Article
4
- 10.17509/ijal.v8i3.15269
- Jan 31, 2019
- Indonesian Journal of Applied Linguistics
Knowledge of specialized academic vocabulary is important for the academic success of EFL natural science students. Specialized words outside the General Service List (GSL) (West, 1953) and the Academic Word List (AWL) (Coxhead, 2000) are necessary for comprehending scientific text. The existing lists of words do not cover all sub-disciplines of natural science. The present study aims to explore the specialized academic words across 11 sub-disciplines of natural science. To identify the words, a corpus-based approach and an expert-judged approach were used. A 5.5-million-word corpus called the Science Academic Journal (SAJ) Corpus was created for this study. Applying the established word selection criteria, 513 word families were selected. The potential list was reviewed by a panel of experts in order to remove the overly-technical words from the list. The Science Academic Word List (SAWL) was established with 432 word families and provided 5.82% coverage of the running words in the SAJ corpus. To validate the word list, the SAWL was tested against two independent corpora. The findings revealed that the SAWL contains 432 word families that are useful for reading journal articles in natural science disciplines. In addition, it was also found that the SAWL performed better on an independent corpus compared to the Science World List (Coxhead & Hirsh, 2007). It is expected that the SAWL established in this study will be a useful source for learning and teaching vocabulary in natural science disciplines.
- Research Article
- 10.14384/kals.2017.24.3.167
- Aug 31, 2017
- Journal of Language Sciences
This paper is intended to develop a word list that helps engineering students with engineering vocabulary frequently used across different disciplines. For this goal, an Engineering English Corpus (EEC) including around 4,000,000 running words was compiled and constructed from 40 textbooks commonly used in 8 engineering disciplines: chemical, computer, electrical, electronics, environmental, industrial, materials science and mechanical. This paper identified a total of 1,170 word families comprised of General Service List (GSL), Academic Word List (AWL) and Technical Word List (TWL) through two criteria such as frequency and range. In detail, the cut-off frequency of GSL and AWL was more than 100 times and the range was at least 10 times in all 8 disciplines exclusive of TWL. The result of the analysis showed that 1) not all words in GSL and AWL were used frequently in the EEC; 2) GSL, AWL and TWL could be learned and taught regardless of the pre-determined order; 3) the created engineering English word list could meet the basic need of engineering students.
- Conference Article
- 10.1109/icamechs.2015.7287103
- Aug 1, 2015
Understanding the vocabulary profiles of students is an important point for improving classroom-based pedagogy as well as materials development based on students' abilities and needs in ESP (English for Specific Purposes). As the second phase of a longitudinal research to assess Japanese university students' vocabulary competency, the researchers collected data on the performance of second and third-year students majoring in science and engineering on the New General Service List Test (NGSLT) and the New Academic Word List Test (NAWLT) developed by Phil Bennett and Tim Stoeckel in 2015 using words from the New General Service List 1.01 (NGSL 1.01) and the New Academic Word List (NAWL) developed by Charles Browne, Brent Culligan and Joseph Phillips in 2014. The NGSL 1.01 consists of 2,801 of the most frequently-appearing words in general use, and is a revision of the original General Service List (about 2,000 words) which was compiled by Michael West in 1953. The NWAL consists of 963 frequently-appearing words in academic text, and is a revision of the original Academic Word List (570 words) which was compiled by Averil Coxhead in 2000. Both the NGSLT and NAWLT are diagnostic tests of written receptive knowledge of the New General Service List and the New Academic Word List respectively. The NGSLT and NAWLT use the same specifications as the Vocabulary Size Test (VST) which was developed by Paul Nation and David Beglar in 2007, and have relatively good test reliability. This paper introduces the background about the tests, the participants in the study, and the data collection method. It also interprets the results of the collected scores, and discusses the implications of the results between departments and years. In future studies, the researchers hope to find ways to perceive the vocabulary abilities of students in their particular ESP fields. By doing so, they can be better informed providers of instruction and create materials tailored to students' particular needs.
- Research Article
3
- 10.22099/jtls.2016.3901
- Oct 1, 2016
- Journal of Teaching Language Skills
This corpus-based study aimed at exploring the most frequently-used academic words in linguistics and compare the wordlist with the distribution of high frequency words in Coxhead’s Academic Word List (AWL) and West’s General Service List (GSL) to examine their coverage within the linguistics corpus. To this end, a corpus of 700 linguistics research articles (LRAC), consisting of approximately 4 million words from four main linguistics sub-disciplines (phonology, morphology, semantics and syntax) was compiled and analyzed based on two criteria; frequency and range. Based on the analysis, a list consisting of 1263 academic word families was produced to provide a useful linguistics academic word list for native and non- native English speakers. Results showed that AWL words account for 10.18 % of the entire LRAC, and GSL words account for 72.48% of the entire LRAC. The findings suggested that of 570 word families in Coxhead’s AWL, 381 (66.84%) word families correspond with the word selections criteria which provide 29.88% of the word families in Linguistics Academic Word List (LAWL). Furthermore, 224 word families that were frequently used in linguistic research article corpus (LRAC) were not listed in GSL and AWL. They accounted for 18.51% of the word families in LAWL with coverage of 5.07% over LRAC, and compared with the 2000 GSL, 658 word families were identified. The results have pedagogical implications for linguistics practitioners and EAP practitioners, researchers, and material designers.
- Research Article
76
- 10.1016/j.jeap.2013.07.001
- Sep 13, 2013
- Journal of English for Academic Purposes
A corpus-based study of academic vocabulary in chemistry research articles
- Research Article
2
- 10.15702/mall.2011.14.1.145
- Apr 1, 2011
- Multimedia-Assisted Language Learning
Current research on curriculum-based textbooks indicates that there is a lack of comprehensive corpus-based studies of textbooks in the Korean EFL context. This recognition prompted us to investigate the vocabulary levels of elementary and secondary curriculum-based English textbooks. A corpus of 5,628,795 running words from a total of 140 different textbooks with the inclusion of the activity books was compiled for analysis. The operational measures for comparison involved the 2,000 General Service List(GSL), the 570 Academic Word List(AWL), the British National Corpus, the Freiburg-Brown Corpus of American English, and the Freiburg-LOB Corpus of British English. Our results indicated that 68% of the textbook words were beyond the 2,570 word level (i.e., total of word families of the GSL & AWL). Further corpus-based analysis indicated that textbooks of secondary schools presented word lists as large as 7,430 words compared to the 3,000 words that are permitted at the high school level by the National Curriculum. In the second part of the study, views and opinions of 600 stakeholders (i.e., learners, teachers, and experts) on a revision of the Basic Word List of the National Curriculum are presented. The results provide implications for the development of a revised Basic Word List.
- Research Article
- 10.2139/ssrn.3094558
- Jan 1, 2017
- SSRN Electronic Journal
This study investigates the distribution and coverage of words in New General Service List (NGSL) and the Academic Word List (AWL) in social science research articles. Sixty-four open-access English social science research articles published in 2013-2015 in the ScienceDirect General category were selected and compiled to the Social Science Corpus (SSC). The AntWordProfiler 1.4.0 was utilized to calculate the frequency and coverage percentage of words from the two word lists. Word families in level 1 and level 2 of the NGSL were utilized over 70 percent, whilst level 3 word families were used around 60 percent of the entire SSC. Similarly, 99.65 percent of the AWL word families were discovered. Regarding coverage, the NGSL word families accounted for over 70 percent and the AWL word families covered around 14 percent revealing significant coverage of both word lists. The top 10 NGSL word families represented journals subject areas from which they were derived, whilst the top 10 AWL word families were used more repeatedly and linked with social science research areas. The finding of high distributions and coverage corroborated that the NGSL and the AWL significantly contribute to vocabulary pedagogy in preparing students for reading and writing social science research articles. Additionally, some pedagogical implication guidelines of the NGSL and the AWL such as flash cards, quizzes, and written tests were also introduced.
- Research Article
42
- 10.1016/j.esp.2018.02.002
- Mar 7, 2018
- English for Specific Purposes
The language of civil engineering research articles: A corpus-based approach
- Research Article
2
- 10.7575//aiac.alls.v.8n.2p.196
- Apr 30, 2017
- Advances in Language and Literary Studies
It is difficult for most of the second language learners in Malaysia to function proficiently in English language due to limited vocabulary knowledge. It has also been challenging for TESL graduates to fit in as ENP teachers due to the lack of specialized vocabulary knowledge in nursing field. Thus, a course books has always been a highly dependable aid in facilitating the teaching and learning in an ENP classroom. The objective of this research is to identify the possible pedagogical aspects of two ENP commercial course books (Oxford English for Careers Nursing 1” (OEFCN1) written by Tony Grice and “Nursing Your English Second Edition” (NYE) by Siti Salina Salim and Mazura Mastura Muhammad) in socializing learners into their discourse communities. The present research looks at the extent of vocabulary coverage in comparison with General Service List (GSL), Academic Word List (AWL), Nursing Education Word List (NEWL) and the 2,000 most frequent nursing words. These course books were photocopied, scanned and converted into computer text files before they were analyzed using WordSmith 4.0 as it is able to provide elemental knowledge on the vocabulary coverage in both course books. The results indicated that both books showed significant result in terms of their coverage based on the three word lists. On the other hand, it is proven that the 2000 most frequent nursing words wordlist is not able to cover as much tokens as compared to GSL, AWL and NEWL combined.
- Research Article
- 10.18853/jjell.2015.57.1.026
- Mar 1, 2015
- The Jungang Journal of English Language and Literature
The present study aims to create a word list to meet the needs of college students who need to understand English textbooks and develop practical knowledge in the area of airline cabin crew service (ACCS). A corpus of eleven cabin crew manuals was compiled (totaling 349,789 running words) to find the lexical coverage of the ACCS word list in the corpus and the high-frequency words that the list consists of. The word list was created according to three principles: specialized occurrence, range, and frequency. The results reveal that the General Service List (GSL) covers 58.16%, the Academic Word List (AWL) covers 18.47%, and the ACCS list covers 18.37% of the words in the corpus. The ACCS list includes 553 of the most frequent word families outside the GSL and AWL. The result from the small-scale reliability test of the ACCS coverage shows that the ACCS word list may serve as reference for a prospective English course in the field. Moreover, a comparison of the high-frequency word list with those of different disciplines showed significant differences among the lists. Although the size of the corpus is relatively small, the finding indicates that specialized vocabulary for the ACCS list is tied to the particular knowledge and the specific word list may be beneficial for ESP college students in the ACCS area. The study suggests that the ACCS word list may be a potential tool for incorporating vocabulary learning into a curriculum for learners and teachers.
- Research Article
102
- 10.3917/rfla.122.0065
- Dec 1, 2007
- Revue française de linguistique appliquée
The coverage of the General Service List (GSL) (West, 1953) and Academic Word List (AWL) (Coxhead, 2000) over a science-based written academic English corpus of approximately 875,000 words is 80%, compared with three corpora of the same size from arts (86.7%), commerce (88.8%), and law (88.5%) (Coxhead, 1998). The AWL coverage of 9.1% over science is similar to arts and law, the coverage of the GSL over science is 65%, 10% lower than the coverage over law, 8% less than arts, and 6% less than commerce. One way to address this gap in coverage is conduct a corpus-based study of the vocabulary in academic science texts to establish whether there is a science-specific vocabulary consisting of words outside the GSL and AWL. Hirsh (2004) found that academic subject areas with the highest proportion of technical vocabulary make use of the lowest proportion of general service vocabulary. This pilot study found 318 such word families with coverage of approximately 4% over a science-specific corpus of 1.5 million running words, in contrast to its coverage of well under 1% of the arts, commerce, and law corpora mentioned above, and a 3,500,000 word corpus of fiction.
- Research Article
3
- 10.7575/10.7575/aiac.ijalel.v.6n.2p.78
- Jan 4, 2017
- International Journal of Applied Linguistics and English Literature
The present study is conducted within the borders of lexicographic research, where corpora have increasingly become all-pervasive. The overall goal of this study is to compile an open-source OPEC[1] Word List (OWL) that is available for lexicographic research and vocabulary learning related to English language learning for the purpose of oil marketing and oil industries. To achieve this goal, an OPEC Monthly Reports Corpus (OMRC) comprising of 1,004,542 words was compiled. The OMRC consists of 40 OPEC monthly reports released between 2003 and 2015. Consideration was given to both range and frequency criteria when compiling the OWL which consists of 255 word types. Along with this basic goal, this study aims to investigate the coverage of the most well-recognised word lists, the General Service List of English Words (GSL) (West ,1953) and the Academic Word List (AWL) (Coxhead, 2000) in the OMRC corpus. The 255 word types included in the OWL are not overlapping with either the AWL or the GSL. Results suggest the necessity of making this discipline-specific word list for ESL students of oil marketing industries. The availability of the OWL has significant pedagogical contributions to curriculum design, learning activities and the overall process of vocabulary learning in the context of teaching English for specific purposes (ESP).Keywords: Vocabulary Profiling- Vocabulary Learning- Word List- OPEC- ESPOPEC stands for Organisation of Petroleum Exporting Countries.
- Research Article
7
- 10.7820/vli.v04.1.stoeckel.bennett
- Jan 1, 2015
- Vocabulary Learning and Instruction
This paper introduces the New General Service List Test (NGSLT), a diagnostic instrument designed to assess written receptive knowledge ofthe words on the New General Service List (NGSL) (Browne, 2014). The NGSL was introduced in 2013 as an updated version of West’s (1953)original General Service List. It is comprised of 2,800 high frequency headwords plus their inflected forms and is designed to provide maximalcoverage of modern English texts. The test introduced here is divided into five 20-item levels, each assessing a 560-word frequency band of theNGSL. Using a multiple choice format, the NGSLT is intended to assist teachers and learners in identifying gaps in knowledge of these highfrequency words. Data from 238 Japanese university students indicate the NGSLT is reliable (α= .93) and that it measures a single construct. Acomparison of NGSLT and Vocabulary Size Test (Nation & Beglar, 2007) scores for a small group of learners shows that the NGSLT providesmore detailed diagnostic information for high frequency words and may therefore be of greater pedagogic use for low and intermediatelevel learners. Ongoing developments include parallel versions of the NGSLT as well as a separate instrument to assess knowledge of the New Academic Word List. Both the NGSLT and New Academic Word List Test are freely downloadable from the NGSL homepage (www.newgeneralservicelist.org).
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.