The Use of Mel Cepstral Coefficients and Markov Models for the Automatic Identification, Classification and Sequence Modelling of Salient Sound Events Occurring During Tennis Matches

Gordon J Hunter,Ahmed I Shihab,Krzysztof Zienowicz

doi:10.1121/1.2934206

Abstract

Some significant events in sports matches occur too quickly to be detected by conventional video. Audio signals, normally sampled at a much higher rate, provide a way to detect such short events. Here, we employ methods used in automatic speech recognition - templates of Mel Frequency Cepstral Coefficients (MFCCs) over several adjacent time windows, with Principal Components Analysis (PCA) - to identify and classify sound events : different tennis strokes, ball bounces, echos, speech and applause, occurring during tennis matches. Excellent success rates were achieved for both event detection ( 97.74 %) and correct classification (on average 98.64 % across all classes) of 1504 sound events in the available recordings. The successful classification rate varied between classes but no class had a success rate below 94%. This could be valuable to spectators, officials and coaches in tennis and other sports - including cricket, baseball and golf - or to make video games (such as the Nintendo Wii) more realistic. We also model sequences of these events in both discrete (event oriented) and continuous time domains, using Markov and other models, to give our system predictive as well as reactive capability, help identify "unusual" or "unexpected" salient sounds and hopefully improve the correct classification rate for the classes where the performance was weakest.

Full Text