Abstract

Two novel unsupervised learning algorithms were developed for improved clustering of multi-modal time series data that are not separable in feature space, which are common characteristics of chemical process data. The algorithms are extensions of the conventional Gaussian Mixture Model (GMM) and K-means clustering. Both algorithms were adapted to account for the time-dependent nature of chemical process data and thus are termed time-constrained GMM (TCGMM) and time-constrained K-means (TCK-means). The algorithms are evaluated using autoregressive time series data with small step changes in the means and variances; a problem that confounds conventional clustering algorithms. In Case Study 1, step changes in the means and variances are implemented at specific time intervals to create two modes. TCGMM outperforms the other algorithms by obtaining a minimum of 85% accuracy in correctly identifying the modes. The TCGMM algorithm is also tested in a second case study where combinations of mean- and variance-shifts are randomly instantiated based on a conditional probability table (CPT). TCGMM outperforms conventional GMM by an average accuracy of 65.4% versus 46.6% and learns the CPT with an average difference in the main diagonal entries (probabilities of remaining in the same mode) of 1.89% and an average difference in the off-diagonal entries (mode transition probabilities) of 0.664%. Keywords: clustering, TCGMM, TCK-means

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.