Symbolic missing data imputation in principal component analysis

Paola Zuccolotto

doi:10.1002/sam.10101

Abstract

AbstractThe concept of symbolic data has been developed with the aim of representing variables whose measurement is affected by some internal variation. This idea has been mainly concerned with the need of aggregating individuals in order to summarize large datasets into smaller matrices of manageable size, retaining as much of the original knowledge as possible. Nevertheless it is often applied also with variables structured from their outset as symbolic variables, although measured on single individuals. This paper deals with the latter framework, and aims at showing that symbolic data analysis techniques can be applied to the field of missing values treatment. The algorithm for a symbolic imputation technique in principal component analysis is presented as a generalization of the basic strategy called interval imputation. An illustrative example and a real data case study show how the proposed technique works. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 171–183, 2011

Full Text