Early and Late Fusion of Temporal Information for Classification of Surgical Actions in Laparoscopic Gynecology

Stefan Petscharnig,Jenny Benois-Pineau,Jorg Keckstein,Klaus Schoffmann,Souad Chaabouni

doi:10.1109/cbms.2018.00071

Abstract

The most essential step towards semiautomatic extraction of relevant surgery scenes is semantic understanding of surgical actions in surgery videos. Currently, Convolutional Neural Networks (CNNs) are a de-facto standard for automatic content classification in many domain, including medical imaging. We aim to include increase the predictive performance of surgical action recognition within gynecologic laparoscopy, a subfield of endoscopic surgery, by fusing temporal information to the input layer of CNNs (early fusion), as well as temporal aggregation of single-frame prediction results (late fusion). Our evaluation shows that the proposed early fusion approaches are able to outperform a single-frame baseline when using the GoogLeNet architecture. Moreover, early fusion of motion information benefits the classification performance regardless of late fusion strategy. Late fusion has a high impact on classification performance, and its increase is additive to the performance increase of early fusion. Eventually, we found that the CNN capacity influences these results drastically. We conclude that the proposed methods in combination with a sufficiently high CNN capacity allow for a substantial increase in predictive performance.q

Full Text