Abstract

The most essential step towards semiautomatic extraction of relevant surgery scenes is semantic understanding of surgical actions in surgery videos. Currently, Convolutional Neural Networks (CNNs) are a de-facto standard for automatic content classification in many domain, including medical imaging. We aim to include increase the predictive performance of surgical action recognition within gynecologic laparoscopy, a subfield of endoscopic surgery, by fusing temporal information to the input layer of CNNs (early fusion), as well as temporal aggregation of single-frame prediction results (late fusion). Our evaluation shows that the proposed early fusion approaches are able to outperform a single-frame baseline when using the GoogLeNet architecture. Moreover, early fusion of motion information benefits the classification performance regardless of late fusion strategy. Late fusion has a high impact on classification performance, and its increase is additive to the performance increase of early fusion. Eventually, we found that the CNN capacity influences these results drastically. We conclude that the proposed methods in combination with a sufficiently high CNN capacity allow for a substantial increase in predictive performance.q

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call