Abstract
Acoustic indices derived from environmental soundscape recordings are being used to monitor ecosystem health and vocal animal biodiversity. Soundscape data can quickly become very expensive and difficult to manage, so data compression or temporal down‐sampling are sometimes employed to reduce data storage and transmission costs. These parameters vary widely between experiments, with the consequences of this variation remaining mostly unknown.We analyse field recordings from North‐Eastern Borneo across a gradient of historical land use. We quantify the impact of experimental parameters (MP3 compression, recording length and temporal subsetting) on soundscape descriptors (Analytical Indices and a convolutional neural net derived AudioSet Fingerprint). Both descriptor types were tested for their robustness to parameter alteration and their usability in a soundscape classification task.We find that compression and recording length both drive considerable variation in calculated index values. However, we find that the effects of this variation and temporal subsetting on the performance of classification models is minor: performance is much more strongly determined by acoustic index choice, with Audioset fingerprinting offering substantially greater (12%–16%) levels of classifier accuracy, precision and recall.We advise using the AudioSet Fingerprint in soundscape analysis, finding superior and consistent performance even on small pools of data. If data storage is a bottleneck to a study, we recommend Variable Bit Rate encoded compression (quality = 0) to reduce file size to 23% file size without affecting most Analytical Index values. The AudioSet Fingerprint can be compressed further to a Constant Bit Rate encoding of 64 kb/s (8% file size) without any detectable effect. These recommendations allow the efficient use of restricted data storage whilst permitting comparability of results between different studies.
Highlights
Animal vocalizations come together with abiotic and human-made sounds to form soundscapes
Confirming prior findings (Sethi, Jones, et al, 2020), our model has demonstrated that performance measures were consistently higher when classifiers are trained on the AudioSet Fingerprint, rather than Analytical Indices (accuracy: +16.9% (z = 10.381799 p < .001), precision: +15.5% (z = 9.7171799 p < .001), recall: +16.9% (z = 10.221799 p < .001), full model outputs Appendix S1: Supplementary 9C)
We have shown that the choice of acoustic index is key and confirm (Sethi, Jones, et al, 2020) that a multidimensional generalist classifier (AudioSet Fingerprint) outperforms more traditional Analytical Indices regardless of the levels of audio compression or recording schedule
Summary
Animal vocalizations come together with abiotic and human-made sounds to form soundscapes. Analytical Indices are a type of acoustic index which are summary statistics that describe the distribution of acoustic energy within the recording (Towsey et al, 2014)—over 60 of which have been designed to capture aspects of biodiversity (Buxton et al, 2018; Sueur et al, 2014) These are commonly used in combination to compare the occupancy of acoustic niches, temporal variation, and the general level of acoustic activity (Bradfer-Lawrence et al, 2019) across ecological gradients or in classification tasks (Gómez et al, 2018). This may result from a lack of standardization: differing index selection, data storage methods, and recording protocols, which all lead to unassessed variation in experimental outputs (Araya-Salas et al, 2019; Bradfer-Lawrence et al, 2019; Sugai et al, 2019)
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have