Abstract
<p>We present results from an exploratory study investigating deep clustering of high-dimension mass spectrometry data. Field measurements were taken from aerosol filter samples collected at sites in multiple seasons in Beijing and Delhi. ~450 samples were analysed using ultra-high pressure liquid chromatography coupled with heated electrospray ionisation and Orbitrap mass spectrometry (UHPLC-HESI-Orbitrap MS), producing over 1000 molecular markers for each sample, each with a distinctive organic molecule and retention time. With far more dimensions than samples (i.e. more columns than rows), this dataset is not ideal for traditional nearest-neighbour clustering methods as the largest peaks dominate the signal, and nearest-neighbour distances blur as dimensionality increases.</p><p><br>We demonstrate the impact of deep clustering, using autoencoders to reduce data dimensionality. Autoencoders are a deep learning method using two linked neural networks to reduce data dimensionality. One “encoder” network reduces the data to a smaller latent space, while a second “decoder” network increases the dimensionality of the latent space back to the original dimensions of the dataset. The two networks are trained together to minimise loss of information, producing a clusterable latent space with similar information content as the original dataset but with far fewer dimensions.</p><p><br>We present comparisons of deep clustering and traditional hierarchical clustering to identify key molecules and features in the sample time series. We will also demonstrate the impact of prescaling the data, a common technique in the application of neural networks.</p>
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.