Abstract

Classification and clustering problems are closely connected with pattern recognition where many general algorithms have been developed and used in various fields. Depending on the complexity of patterns in data, classification and clustering procedures should take into consideration both continuous and categorical data which can be partially missing and erroneous due to mismeasurements and human errors. However, most algorithms cannot handle missing data and imputation methods are required to generate data to use them. Hence, the main objective of this work is to define a classification and clustering framework that handles both outliers and missing values. Here, an approach based on mixture models is preferred since mixture models provide a mathematically based, flexible and meaningful framework for the wide variety of classification and clustering requirements. More precisely, a scale mixture of Normal distributions is updated to handle outliers and missing data issues for any types of data. Then a variational Bayesian inference is used to find approximate posterior distributions of parameters and to provide a lower bound on the model log evidence used as a criterion for selecting the number of clusters. Eventually, experiments are carried out to exhibit the effectiveness of the proposed model through an application in Electronic Warfare.

Highlights

  • Classification and clustering problems are closely connected with pattern recognition [1] where many general algorithms [2,3,4] have been developed and used in various fields [5,6]

  • An approach based on mixture models is preferred since mixture models provide a mathematically based, flexible and meaningful framework for the wide variety of classification and clustering requirements [8]

  • Outliers are only considered for continuous data xq = j=1 since only reliable categorical variables are assumed to be filled in databases and unreliable ones are processed as missing data

Read more

Summary

Introduction

Classification and clustering problems are closely connected with pattern recognition [1] where many general algorithms [2,3,4] have been developed and used in various fields [5,6]. Depending on the complexity of patterns in data, classification and clustering procedures should take into consideration both continuous and categorical data which can be partially missing and erroneous due to mismeasurements and human errors. The location mixture model [9] that assumes that continuous variables follow a multivariate. The location mixture model approach is retained since it better models relations between continuous and categorical features when data patterns are mostly designed by first choosing patterns of categorical features to achieve a specific goal and choosing continuous features that meet constraints related to the chosen patterns and the problem environment. The location mixture model naturally responds to that dependence structure by assuming that continuous variables are normally distributed conditionally to categorical variables. Gaussian distributions [11] is updated to handle outliers and missing data issues for any types of data.

Assumptions on Mixed-Type Data
Distribution of Mixed-Type Data
Outlier Handling
Missing Data Handling
Model and Inference
Variational Bayesian Inference
Classification and Clustering
Application
Classification Experiment
Clustering Experiment
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.