Unsupervised pattern recognition of mixed data structures with numerical and categorical features using a mixture regression modelling framework

Shu-Kay Ng,Richard Tawiah,Geoffrey J Mclachlan

doi:10.1016/j.patcog.2018.11.022

Shu-Kay Ng, Richard Tawiah + Show 1 more

Open Access

https://doi.org/10.1016/j.patcog.2018.11.022

Copy DOI

Journal: Pattern Recognition	Publication Date: Nov 20, 2018
Citations: 12	License type: cc-by-nc-nd

Affiliation: Griffith University, University of Queensland

Abstract

In the present era of “Big Data”, data collection involving massive amount of features with a mix of variable types is commonplace. Mixture model-based techniques for statistical cluster analysis of mixed numerical and categorical feature data have their limitations, due to the difficulty in specifying appropriate component-densities when common multivariate distributions become invalid. This problem is particularly apparent in applications where the outcome feature variables are in a categorical form. An example of such an application is the analysis of binary morbidity data in national health survey, where the aims are to quantify heterogeneous comorbidity patterns of health conditions and identify (risk)-features of individuals that explain the heterogeneity. In this paper, we propose an unsupervised mixture regression model of multivariate generalised Bernoulli distributions for cluster analysis on the basis of categorical outcome features and mixed risk features. The proposed method is illustrated using simulated data and two real data sets concerning comorbidity patterns among 20,788 Australians who participated in the 2007–2008 National Health Survey (NHS) and among 470 patients who were recruited in a randomised controlled trial of a health intervention about in-patient detoxification from alcohol, heroin or cocaine in Boston. The method is also readily applicable to cluster more general mixed-feature data via the framework of consensus clustering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Unsupervised pattern recognition of mixed data structures with numerical and categorical features using a mixture regression modelling framework

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Similar Papers

Robust Mixture of Linear Regression Models
Shaheena Bashir ... E M Carter
Communications in Statistics - Theory and Methods | VOL. 41
Shaheena Bashir, et. al.Shaheena Bashir ... E M Carter
15 Sep 2012
Communications in Statistics - Theory and Methods | VOL. 41

Analysis of performance of accuracy by adding new features individually using Relief-F and Budget Tree Random Forest (RFBTRF) method
...
International Journal of Nonlinear Analysis and Applications | VOL. 13
, et. al. ...
01 Jan 2021
International Journal of Nonlinear Analysis and Applications | VOL. 13

Rough set model based feature selection for mixed-type data with feature space decomposition
Kyung-Jun Kim ... Chi-Hyuck Jun
Expert Systems with Applications | VOL. 103
Kyung-Jun Kim, et. al.Kyung-Jun Kim ... Chi-Hyuck Jun
08 Mar 2018
Expert Systems with Applications | VOL. 103

Securing a Smart Home with a Transformer-Based IoT Intrusion Detection System
Minxiao Wang ... Ning Yang
Electronics | VOL. 12
Minxiao Wang, et. al.Minxiao Wang ... Ning Yang
04 May 2023
Electronics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised pattern recognition of mixed data structures with numerical and categorical features using a mixture regression modelling framework

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition