Use of the university’s enrolment campaign database for the development of a computer model to predict student expulsion

A V Zharikov,D Yu Kozlov,E V Zhuravlev,O V Zhurenkov

doi:10.1088/1742-6596/1615/1/012014

Abstract

The article discusses the construction of a computer model to predict the problems occurrence in students in the educational process at the university. The following data sources of Altai State University were used for this purpose: “Admissions Office” (enrollees database) and “Dean’s office” (database of students) for 2013-2018. These data were combined using developed SQL scripts. While analyzing the obtained combined data set, we had to face the difficulties typical for solving data analysis problems. Thus, it turned out that there are incomplete and inconsistent data or cases when one and the same entity is named differently, etc. In order to solve these problems, we wrote a script in the R programming language using regular expressions, the data were unified and standardized, and the missing data were restored using the information from other fields of the data set. Then we discarded the variables with the near-zero dispersion, which could not make a significant contribution to the developed predictive model. After that, the data set under study was divided into 2 parts: the 2013-2017 data were taken to build a predictive model with the use of the logistic regression algorithm, while the data for 2018 were used, in fact, to predict whether a particular student would be expelled. It should be mentioned that the 2013-2017 data were divided into the training and test samples in the proportion of 90% and 10% correspondingly. The test result of the computer model built in the R programming language showed satisfactory accuracy; the most significant factors affecting student expulsion were also identified. The paper substantiates the economic feasibility of using the developed computer model at the university.

Full Text