Optimization methods for the imputation of missing values in Educational Institutions Data

D Aureli,R Bruni,C Daraio

doi:10.1016/j.mex.2020.101208

Abstract

The imputation of missing values in the detail data of Educational Institutions is a difficult task. These data contain multivariate time series, which cannot be satisfactory imputed by many existing imputation techniques. Moreover, almost all the data of an Institution are interconnected: the number of graduates is not independent from the number of students, the expenditure is not independent from the staff, etc. In other words, each imputed value has an impact on the whole set of data of the institution. Therefore, imputation techniques for this specific case should be designed very carefully. We describe here the methods and the codes of the imputation methodology developed to impute the various patterns of missing values which appear in similar interconnected data. In particular, a first part of the proposed methodology, called ``trend smoothing imputation'', is designed to impute missing values in time series by respecting the trend and the other features of an Institution. The second part of the proposed methodology, called ``donor imputation'', is designed to impute larger chunks of missing data by using values taken form similar Institutions in order to respect again their size and trend.•Trend smoothing imputation can handle missing subsequences in time series, and is given by a weighted combination of: (a) weighed average of the other available values of the sequence, and (b) linear regression.•Donor imputation can handle full sequence missing in time series. It imputes the Recipient Institution using the values taken from a similar institution, called Donor, selected using optimization criteria.•The values imputed by our techniques should respect the trend, the size and the ratios of each Institution.

Highlights

Universities and other organizations providing higher level education are collectively called Higher Education Institutions (HEIs)
The data describing each specific HEI, for example the number of students, the number of graduates, etc., are needed to analyze and evaluate the educational systems [1]. In many cases, these data contain a substantial amount of missing values
If the number of students in a given year for a given university does not appear in our dataset, but it should have been registered because that university was active and running in that year, that information is marked as a missing value

Summary

Method Article

Optimization methods for the imputation of missing values in Educational Institutions Data. A first part of the proposed methodology, called “trend smoothing imputation”, is designed to impute missing values in time series by respecting the trend and the other features of an Institution. The second part of the proposed methodology, called “donor imputation”, is designed to impute larger chunks of missing data by using values taken form similar Institutions in order to respect again their size and trend. Method name: Trend Smoothing Imputation and Donor Imputation Keywords: Information Reconstruction, Data imputation, Machine learning, Interconnected data, Educational Institutions Article history: Received 2 December 2020; Accepted 30 December 2020; Available online 4 January 2021.

Introduction

∗Method details