Abstract

We analyse pretrained and non-pretrained deep neural models to detect 10-seconds Bowel Sounds(BS) audio segments in continuous audio data streams. The models include MobileNet, EfficientNet, and Distilled Transformer architectures. Models were initially trained on AudioSet and then transferred and evaluated on 84hours of labelled audio data of eighteen healthy participants. Evaluation data was recorded in a semi-naturalistic daytime setting including movement and background noise using a smart shirt with embedded microphones. The collected dataset was annotated for individual BS events by two independent raters with substantial agreement(Cohen's Kappa κ = 0.74). Leave-One-Participant-Out cross-validation for detecting 10-second BS audio segments, i.e. segment-based BS spotting, yielded a best F1 score of 73% and 67%, with and without transfer learning respectively. The best model for segment-based BS spotting was EfficientNet-B2 with an attention module. Our results show that pretrained models could improve F1 score up to 26%, in particular, increasing robustness against background noise. Our segment-based BS spotting approach reduces the amount of audio data to be reviewed by experts from 84h to 11h, thus by ∼87%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call