Abstract

Abstract BACKGROUND AND AIMS Currently >20 000 native peptides in urine are known that are highly dynamic and able to display the status of different organs, especially the kidney. The characterization of urinary peptide profiles (UPP) enables the depiction of kidney disease severity, progression, fibrosis, and informs about the disease etiology. Advanced machine learning algorithms enable combining the changes in the very complex UPP associated with specific disease etiologies and reducing the dataspace to only few dimensions. Here, we show the application of a supervised machine learning pipeline for the visualization of different CKD etiologies based on high-dimensional peptidomics data, toward non-invasive disease classification. METHOD The Uniform Manifold Approximation and Projection (UMAP) algorithm was used as a novel nonlinear dimensionality-reduction technique to visualize and differentiate the UPP of patients with CKD of different etiologies. UPP of individual CKD patients (with diabetic kidney disease DKD, (n = 386), IgA nephropathy (n = 743) and vasculitis (n = 150)) and 369 healthy controls were extracted from the Human Urinary Proteome Database which contains >85 000 proteomics datasets analyzed using capillary electrophoresis coupled mass spectrometry. About 80% of the extracted datasets were used as a training and 20% as validation set. RESULTS When applying supervised-UMAP to the DKD patient and control datasets, excellent separation with an F1 score of 99.5% ± 0.9% in the training set, and 93.1% ± 3.3% in the independent test set could be observed. Subsequently, this approach was applied to differentiate controls and three kidney diseases (DKD, IgA nephropathy and vasculitis) simultaneously. In the training set an accuracy of up to 98% in DN and controls, and an overall F1 score of 93.7% ± 2.3% (Figure) was achieved. In the independent test set, accuracy decreased as expected to around 90% for controls, 83.8% for IgA nephropathy, 79.2% for DKD and 66.7% for vasculitis. The overall F1 score in the test set is 81.9% ± 2.2%. Of note, controls (n = 369) were consistently classified with the highest accuracy across all groups, the disease with smallest sample size (vasculitis, n = 150) always showed the lowest accuracy. A substantial proportion of vasculitis was classified as IgA nephropathy, which has the largest sample size of n = 743. For the validation of the pipeline the permutation test was used. Permutation test was repeated 100 times using all the samples of CKD -free controls and three kidney diseases. The resulted scores were normally distributed, with a mean of 32.5% and standard deviation of 1.2%. Compared with the true F1 score, which was calculated as 81.9% from above, the probability of obtaining such a high score by chance is very low (P < 0.01). CONCLUSION We show that UMAP combined with supervised machine learning applied to high dimensional peptidomics data, enables distinguishing multiple kidney diseases with good accuracy and with very small standard deviation between multiple train-test splits. To our knowledge, our study is the first of its kind to reduce the complexity of the urinary peptidome to a single point in space, and categorize disease etiology based on the spatial information. The approach presented has the potential to enable non-invasive differential diagnosis of kidney disease etiologies. To improve accuracy of this non-invasive method, inclusion of additional clinical parameters will be tested.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call