This scientific article addresses two critical tasks in data analysis—time series classification and clustering, particularly focusing on heart sound recordings. One of the main challenges in analyzing time series lies in the difficulty of comparing different series due to their variability in length, shape, and amplitude. Various algorithms were employed to tackle these tasks, including the Long Short-Term Memory (LSTM), KNN, recurrent neural network for classification and the K-means and DBSCAN methods for clustering. The study emphasizes the effectiveness of these methods in solving classification and clustering problems involving time series data containing heart sound recordings. The results indicate that LSTM is a powerful tool for time series classification due to its ability to retain contextual information over time. In contrast, KNN demonstrated high accuracy and speed in classification, though its limitations became apparent with larger datasets. For clustering tasks, the K-means method proved to be more effective than DBSCAN, showing higher clustering quality based on metrics such as silhouette score, Rand score, and others. The data used in this research were obtained from the UCR Time Series Archive, which includes heart sound recordings from various categories: normal sounds, murmurs, additional heart sounds, artifacts, and extra systolic rhythms. The analysis of results demonstrated that the chosen classification and clustering methods could be effectively used for diagnosing heart diseases. Furthermore, this research opens up new opportunities for further improvement in data processing and analysis methods, particularly in developing new medical diagnostic tools. Thus, this work illustrates the effectiveness of machine learning algorithms for time series analysis and their significance in improving cardiovascular disease diagnosis.
Read full abstract