The article’s subject matter is the processing of abdominal EMG recordings and finding breathing patterns. The goal is to automatically classify respiratory patterns into two classes, or clusters, by two breathing patterns, regular and irregular, using machine learning (ML) methods. The object of the study was to obtain a dataset of 40 randomly picked abdominal EMG recordings (sampling rate equal to 200 Hz) borrowed from the complete dataset published by the Computational Clinical Neurophysiology Laboratory and the Clinical Data Animation Laboratory of Massachusetts General Hospital. The tasks to be solved are as follows: finding ETS (errors-trend-seasonality) model for the EMG series using the exponential smoothing method; obtaining denoised and detrended signals; obtaining the Hurst exponents for EMGs using the power-law decaying of correlograms for the denoised and detrended signals; describing the variabilities, SNR, the outlier fractions, and Hurst exponents by robust statistics, performing correlation analysis, and Principal Components Analysis (PCA); analyzing the structure of the distant matrix by a graph-based technique; obtaining the periodograms in the frequency domain using the known Wiener-Khinchin theorem; and finding the best models and methods of classification and clusterization and evaluating them within modern Machine Learning methods. The methods used are exponential smoothing, the Wiener-Khinchin theorem, the graph theory method, principal component analysis, programing within MAPLE 2020, and data processing by Weka. The authors obtained the following results: 1) wide data variability has been rated with the median absolute deviations, which is the most robust statistic in this case; 2) most of the signals (38 of 40) showed frequent outliers: from a few percent up to 24.6 % of emissions; 3) these four variables: outliers' percentage, variability, SNR, and persistency factors – form the attributes of input vectors of the subjects for further Machine Learning with Weka software; 4) Manhattan distances matrix among subjects' vectors in 4D attributes space allows imaging the data set as a weighted graph, the vertices of which are subjects; 5) the weights of the graph's edges reflect distances between any pair of them. "Closeness centralities" of vertices allowed us to cluster the data set on two clusters with 11 and 29 subjects, and Weka clustering algorithms confirmed this result. 6) The learning curve shows that a sufficiently small data set (from 25 subjects) might be suitable for classification purposes. Conclusions. The scientific novelty of the results obtained is as follows: 1) the Error-Trend-Seasonality model was the same for all data sets. Abdominal EMG of sleeping patients had additive errors and undamped trends without any seasonality; 2) the correlograms' decaying according to power law had been set, and Hurst exponents were in the range (of 0.776–0.887). This testifies to "long memory" (high persistence) of abdominal EMGs; 3) the modified Z-scores and robust statistics with the highest breakdown values were used for the EMG parameters because of many outliers; 4) breathing patterns were set using the periodograms in the frequency domain using the Wiener-Khinchin theorem; 5) the new graph-based method was successfully exploited to cluster the dataset. Parallel clustering with Weka algorithms confirmed the graph-based clustering results.
Read full abstract