Laser Doppler Vibrometers (LDVs) are exceptionally well suited to non-contact vibration sensing applications in various environments. This work focuses on diarisation of conversations that might be recorded via a drone-mounted LDV by reducing the effect of external noise, extracting useful features from frames of audio and clustering them into homogenous segments based on speaker identity. The two-step noise reduction (TSNR) technique was introduced to these vibroacoustic data for the first time and tested against Gaussian bandpass filtering for noise reduction from sources such as laser speckle and additional broadband ‘white’ noise. Feature extraction was then performed using a time-delay neural network, with the grouping of frames to a particular speaker tested with various clustering methods. Each noise reduction and clustering technique combination were tested on a twospeaker conversation recorded via the LDV. In the case of no added noise, the most effective combination was found to be the TSNR/Agglomerative Hierarchical Clustering (AHC) combination with a diarisation error rate of 6.13%. In the case of additional broadband noise, the most effective combination was found to be TSNR followed by Gaussian bandpass filtering then clustering via AHC with a diarisation error rate of 11.9%. With this work, another aspect of the challenge of covertly obtaining and interpreting vibroacoustic intelligence in remote and hostile environments using LDVs has been addressed.