Abstract

Multivariate data sets often contain gaps in the data matrix. Especially with medical data, missing values are not always avoidable. Most techniques of data analysis do not allow for data gaps; a brief overview is given of the methods currently used to cope with this problem. There are two major groups of missing-data handling techniques: preprocessing techniques used before the data anaysis, and techniques integrated into the data analysis. Preprocessing tecniques can involve deletion of incomplete objects or variables, which loses existing values, or replacement of missing data by estimates, which introduces pseudo-information and bias. Integrated methods are not usually satisfactory. To avoid most of these disadvantages, a new preprocessing technique is proposed for deleting missing data. The algorithm comprises a stepwise deletion of both variables and objects, which retains as much of the data as possible. It is demonstrated on several artificially constructed problem data sets and on some real clinical data collections. It is shown to retain considerably more of the original data sets than other deleting procedures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call