Abstract

To train Machine Listening models that classify sounds we need to define recognizable names, attributes, relations, and interactions that produce acoustic phenomena. In this talk, we will review examples of different types of categorizations and how they drive Machine Listening. Categorization of sounds guides the annotation processes of audio datasets and the design of models, but at the same time can limit performance and quality of expression of acoustic phenomena. Examples of categories can be simply named after the sound source or inspired by Cognition (e.g., taxonomies), Psychoacoustics (e.g., adjectives), and Psychomechanics (e.g., materials). These types of classes are often defined by one or two words. Moreover, to acoustically identify sound events we may require instead a sentence providing a description. For example, “malfunctioning escalator” versus “a repeated low-frequency scraping and rubber band snapping.” In any case, we still have limited lexicalized terms in language to describe acoustic phenomena. Language determines a listener's perception and expressiveness of a perceived phenomenon. For example, the sound of water is one of the most distinguishable sounds, but how to describe it without using the word water? Despite limitations in language to describe acoustic phenomena, we should still be able to automatically recognize acoustic content in an audio signal at least as well as humans do.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call