The process of manually annotating sports footage is a demanding one. In American football alone, coaches spend thousands of hours reviewing and analyzing videos each season. We aim to automate this process by developing a system that generates comprehensive statistical reports from full-length football game videos. Having previously demonstrated the proof of concept for our system, here, we present optimizations to our preprocessing techniques along with an inventive method for multi-person event detection in sports videos. Employing a long short-term memory (LSTM)-based architecture to detect the snap in American football, we achieve an outstanding LSI (Levenshtein similarity index) of 0.9445, suggesting a normalized difference of less than 0.06 between predictions and ground truth labels. We also illustrate the utility of snap detection as a means of identifying the offensive players’ assuming of formation. Our results exhibit not only the success of our unique approach and underlying optimizations but also the potential for continued robustness as we pursue the development of our remaining system components.
Read full abstract