Abstract Aims Pulmonary hypertension (IP) characterized by an average resting pulmonary pressure ≥20 mmHg can sustain various clinical conditions that differ in physiopathological, haemodynamic, and therefore therapeutic aspects. The goal of our work was to apply a machine learning algorithm that could accurately distinguish pre- and post-heart pulmonary hypertension through non-invasive methods (medical history, clinical, and echocardiographic data). Methods and results In order to achieve our goal we used the ‘decision tree’ machine learning algorithm implemented in the C5.0 package of the R development environment. The first step was the preparation of the data. The dataset of patients with IP was composed of 85 patients divided into XX precapillary IP (1) and YY postcapillary (2). Each patient is described by 11 features: some comorbidities (arterial hypertension and atrial fibrillation), BMI, right axial deviation on ECG, DLCO, and some echocardiographic measurements (e/e′, right atrial area, S wave at TDI, acceleration time on the pulmonary, inferior vena cava, diameters of the right ventricle). The dataset was divided into a data.train training subset (45 patients) and an evaluation subset (40 patients), maintaining the proportion between classes. Starting from the training dataset, the C5.0 algorithm generated the decision tree shown in Figure 1. The root node was made up of the mitral pattern e/e′, followed by the right axis deviation on the ECG and the acceleration rate on the lung that the algorithm considered the best discriminated features. The model was then validated in the validation dataset and through the Caret package and the Confusion matrix function we calculated the performance metrics of the algorithm obtaining an accuracy of 0.87, a kappa statistic of 0.742, a sensitivity of 0.913, and a specificity of 0.823. The true positive rate was 0.87 while the true negative rate was 0.87. The performance of the model was also measured using the ROC curve, obtaining an area under the curve of 0.916. Conclusions Our results show that the ‘decision tree’ algorithm starting from echocardiographic data and the ECG has a good ability to discriminate between the precapillary and postcapillary IP. In particular, the decision chain consisting of: mitral pattern and / and ratio ≤8, right axial deviation on the ECG and acceleration time on the lung ≤80 ms seems to predict the IP class with reasonable accuracy. Our results confirm that the probability of prediction and the prediction itself depend, however, on what degree of purity the partitions learned during the decision tree construction process are made up. To improve the estimation of the algorithm’s performance and thus generalize the results obtained, we believe to evaluate this approach on larger datasets also considering different machine learning algorithms. 372 Figure
Read full abstract