In data-based situation assessment applications, the proliferation of data acquired and recorded on current technological systems is a key issue in that data remain unlabeled because labeling would require too much time and implies prohibitive costs. The data should therefore speak for itself. The different situations, e.g., normal or faulty, must hence be learned only from the data. Clustering methods, also named unsupervised classification methods, can be used for that purpose. These methods are designed to cluster the samples according to some similarity criterion. The different clusters can be associated to different situations whose discrimination may be relevant to obtain a proper diagnosis.Numerous algorithms have been developed in recent years for clustering numeric data but these methods are not applicable to categorical data. This is the case of the algorithm DyClee, named DyClee-N in the paper. However, in many application domains, qualitative features are key to properly describe the different situations. DyClee-N was recast to produce a version, named DyClee-C that accepts categorical features, but only categorical features. This paper presents DyClee-N&C that subsumes both the numeric and categorical feature based algorithms DyClee-N and DyClee-C respectively. DyClee-N&C is applied to a data set of the literature for the evaluation of risk in the automobile domain and compared to state of the art clustering methods.
Read full abstract