Abstract
This work focuses on the Gaussian Mixture Model (GMM), a machine learning model used for density estimation and cluster analysis in healthcare, networking, etc. The Expectation-Maximization (EM) algorithm is commonly used to train GMMs. One of the main challenges facing this algorithm when running on embedded systems is the crippling memory constraints. In fact, EM requires several scans of the dataset and we observed that when the dataset cannot fully reside in the main memory, its execution is dramatically slowed down by I/O movements. In this paper, we present an optimization of the EM algorithm for GMMs that reduces the number of I/O operations thanks to two main contributions: (1) A divide-and-conquer strategy that divides the dataset into chunks, learns the GMM separately on each chunk and combines the results incrementally. By doing so, we prevent data from being swapped several times during the learning process. (2) Restricting the training on a subset of data whose volume is inferred online using data properties while producing good accuracy. On average, our results show a 63% improvement in overall execution time with comparable accuracy. We also adapted GMM learning to run in a limited time budget while hitting a good trade-off between execution time and energy consumption. This solution succeeded in meeting the fixed deadline in 100% of the cases and in reducing the energy consumption by up to 68.77%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.