Abstract

Nowadays, due to the difficulty of data acquisition and expensive manual annotation, respiratory sound classification suffers from limited training samples, which restrains the performance improvement of existing methods. To learn more information from the limited samples, we previously proposed a method of contrastive embedding learning to incorporate additional out-of-class information into the model. However, since the method mapped each entire sample to a deep embedding vector and modelled the distribution of the embeddings, it hardly learned the detailed information within the samples. In fact, a sample is a finite combination of various components, and the classification task essentially is to detect the presence of components that contain adventitious sounds, where detailed component-wise information is crucial. To this end, a method of patch-level contrastive embedding learning based on finer-grained patches is further proposed in this paper. It divides each sample into multiple patches and maps the patches to the embedding space. The patches are split into different subclasses, according to the type of adventitious sounds contained in each patch. Considering that there might be no patch-level labels provided in most cases, a Multi-Instance Learning (MIL) based approach is designed to estimate the labels. Then by modelling intra- and inter-subclass distance between the patch-level embeddings, the method learns the detailed information about the difference between patches, which benefits the identification task. The results following random and official splitting on the ICBHI dataset show that our method achieves the performance of 79.99% and 52.95%, exceeding the previous one by 1.81% and 1.58%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call