Word Frequencies in Linguistic Articles Published in SINTA Indexed Journals

Heri Heryono,Nani Darmayanti,Susi Yuliawati,Dadang Suganda

doi:10.5614/sostek.itbj.2023.22.1.8

Abstract

Multiword sequences are a language pattern that occurs when a bunch of words emerge in a similar register. In research papers conducted by lecturers and students, different topic areas and indexes has created various characteristics of lexical bundles. The method of this research is qualitative, combining corpus design to identify the sequence of words within the text. The corpus data were generated from five different indexing journals, yet the topic is linguistics. Initially, the whole papers were converted to text format to deal with readability in the program used. The program used was Orange Apps version 3.27 by applying the textable, data table, and text mining menus. The sources of the data are emphasized as being academic research indexed in SINTA 5, published in 2020. The main theory of used in this research is that of Biber’s (2007) which discusses the main characteristics and number of criteria for defining word strings. This observation resulted in 207.896 characters and 33.636 words. There were 4,273 words based on the pre-processing analysis result, which included transformation, tokenization, and PoS-tagging. From a total of 4,273 words, virus, deixis, and slang were the most frequently occurring. Based on these results, it can be concluded that the majority of journal articles are about viruses and slang. They pertain to the prevalent topic of pandemics at the time the journals were published. When the process of writing a journal article is in progress, this information may aid the authors in identifying the journal’s keywords and most frequent words.

Full Text