More than 900000 deaths were caused by Colorectal Cancer (CRC) in 2020. Colonoscopy is the gold standard for colorectal cancer screening, with studies concluding that colonoscopies significantly reduce mortality from CRC. It has been shown in the literature that computer-aided detection (CAD) systems can improve adenoma detection. In particular, deep learning models have shown promising results in helping physicians reduce the number of missed lesions during real-time colonoscopies. To keep up with the increase in resolution of the colonoscopies and to perform inference in real-time in smaller medical devices, faster models (i.e., models that can process images at high frame rates) are required.In this work, we use YOLOv4 to detect polyps, which are known to be CRC precursors. To further increase the inference speed of the model, achieve real-time performance, and make the model smaller, we deployed it on NVIDIA TensorRT, which optimizes and quantizes the model using different floating-point and integer representation levels down to 8-bit.Different methods of regularization, data pre-processing, and data augmentation were tested. An analysis of various data augmentation methods was conducted, where we individually tested these techniques by stacking them on top of previously used ones and assessed the improvements. The effects of transfer learning, using custom anchor boxes, the usage of different optimizers, Cross-Iteration Batch Normalization (CBN), and Distance-Intersection Over Union Non-Maximum Suppression (DIOU-NMS) were equally analyzed.We used publicly available datasets to train, test, and validate our model to facilitate comparison to other studies. To evaluate the inference speed, a publicly available video dataset was used. We achieved 82.93% for mAP, 81.44% for precision, 75.96% for recall in the Etis-Larib dataset, and 90.96% for mAP, 88.65% for precision, 87.62% for recall, 88.23% for F1, and 87.86% for F2 using the CVC-ClinicDB dataset. Using an NVIDIA RTX 2080TI graphics processing unit (GPU), a speed of approximately 98 frames per second (FPS) was obtained on three videos from the Colonoscopic video dataset. For the FP16 version of the implementation, the inference speed increased to 163 FPS. For the INT8 version of the model, the inference speed was further increased to approximately 172 FPS.An important conclusion was that for FP16 and INT8 levels of quantization, the accuracy metrics for the model did not suffer much degradation and, in some cases, even increased performance, thus suggesting that a regularization effect can occur when using quantization. This implies that the network may become smaller, faster, and more precise (which could prove to be essential for resource-constrained devices) and that quantization might also be used as a regularization technique. Also, data augmentation techniques proved crucial to overcoming the lack of labeled data, which often happens in the medical field due to privacy reasons.
Read full abstract