Fish Species Recognition from Video using SVM Classifier

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

To build a detailed knowledge of the biodiversity, the geographical distribution and the evolution of the alive species is essential for a sustainable development and the preservation of this biodiversity. Massive databases of underwater video surveillance have been recently made available for supporting designing algorithms targeting the identification of fishes. However these video datasets are rather poor in terms of video resolution, pretty challenging regarding both the natural phenomena to be considered such as murky water, seaweed moving the water current, etc, and the huge amount of data to be processed. We have designed a processing chain based on background segmentation, selection keypoints with an adaptive scale, description with OpponentSift and learning of each species by a binary linear Support Vector Machines classifier. Our algorithm has been evaluated in the context of our participation to the Fish task of the LifeCLEF2014 challenge. Compared to the baseline designed by the LifeCLEF challenge organizers, our approach reaches a better precision but a worse recall. Our performances in terms of species recognition (based only on the correctly detected bounding boxes) is comparable to the baseline, but our bounding boxes are often too large and our score is so penalized. Our results are thus really encouraging.

Similar Papers
  • Research Article
  • Cite Count Icon 26
  • 10.1177/03611981221128806
Automated Vehicle to Vehicle Conflict Analysis at Signalized Intersections by Camera and LiDAR Sensor Fusion
  • Oct 28, 2022
  • Transportation Research Record: Journal of the Transportation Research Board
  • Alabi Mehzabin Anisha + 4 more

This research presents a robust approach for automated safety diagnosis using sensor fusion techniques. This work fuses the outputs of a roadside low-resolution camera and a solid-state LiDAR. For vehicle classification and detection in videos, the YOLO-v5 object detection model was utilized. The raw 3D point clouds generated by the LiDAR are processed by two manual steps—ground plane transformation and background segmentation, and two real-time steps—foreground clustering and bounding box fitting. Taking the generated 2D bounding boxes of both camera and LiDAR, we associate the common bounding box pairs by thresholding on the Euclidean distance threshold of 6 ft between the centroid pairs. We perform weighted measurement updates based on the root mean square error of each of the sensor’s detection compared with manually labeled ground truths. The fused measurements are tracked by using linear constant velocity Kalman filter. With the generated trajectories, we compute post encroachment time at pixel-level conflicts based on the generated vehicle trajectories. We have proposed a complete bipartite graph-matching strategy of vehicle parts along with the conflict angle to obtain conflict types—rear-end, sideswipe, head-on, and angle conflict. A case study on a signalized intersection is presented. The output of the proposed framework performs with 97.384% precision and 95.316% recall. It is better than both single-sensor-based systems in relation to detection count and localization. It is expected that the proposed method can be employed to diagnose road safety problems and inform the required countermeasures.

  • Research Article
  • Cite Count Icon 11
  • 10.1007/s12664-022-01331-7
Computer-aided automated diminutive colonic polyp detection in colonoscopy by using deep machine learning system; first indigenous algorithm developed in India.
  • Apr 1, 2023
  • Indian Journal of Gastroenterology
  • Srijan Mazumdar + 3 more

Colonic polyps can be detected and resected during a colonoscopy before cancer development. However, about 1/4th of the polyps could be misseddue to their small size, location or human errors. An artificial intelligence (AI) system can improve polyp detection and reduce colorectal cancer incidence. We are developing an indigenous AI systemto detect diminutive polyps in real-life scenarios that can be compatible with any high-definition colonoscopy and endoscopic video-capture software. We trained a masked region-based convolutional neural network model to detect and localize colonic polyps. Three independent datasets of colonoscopy videos comprising 1,039 image frames were used and divided into a training dataset of 688 frames and a testing dataset of 351 frames. Of 1,039 image frames, 231 were from real-life colonoscopy videos from our centre. The rest were from publicly available image frames already modified to be directly utilizable for developing the AI system. The image frames of the testing dataset were also augmented by rotating and zooming the images to replicate real-life distortions of images seen during colonoscopy. The AI system was trained to localize the polyp by creating a 'bounding box'. It was then applied to the testing dataset to test its accuracy in detecting polyps automatically. The AI system achieved a mean average precision (equivalent to specificity) of 88.63% for automatic polyp detection. All polyps in the testing were identified by AI, i.e., nofalse-negativeresult in the testing dataset (sensitivity of 100%). The mean polyp size in the study was 5 (± 4) mm. The mean processing time per image frame was 96.4minutes. This AI system, when applied to real-life colonoscopy images, having wide variations in bowel preparation and small polyp size, can detect colonic polyps with a high degree of accuracy.

  • Video Transcripts
  • 10.48448/38hq-8y89
Helicopter Video Dataset for Detecting and Tracking
  • Mar 12, 2021
  • Underline Science Inc.
  • Diego Marez + 3 more

Labeled real world data is often difficult to obtain and especially scarce in Navy-related domains. Currently, relevant annotated data that does exist is frequently limited to large or medium sized bounding boxes, making it difficult to train computer vision algorithms to recognize smaller objects of interest. In this work, we present a naval-specific video dataset of helicopter operations performed at sea. This dataset contains videos from multiple camera sensors to incorporate variations in lens distortions and camera noise. It consists of videos ranging from one to three minutes each recorded during Littoral Combat Ship (LCS) exercises off the California coast in the fall and winter. Special consideration was taken to emphasize small instances of helicopters relative to the field of view and therefore provides a more even ratio of small-, medium-, and large-sized bounding boxes for training more robust detectors and trackers. Following the conventions of the field, we define small, medium, and large objects as objects with bounding boxes sized: less than 32, 32-96, and greater than 96 pixels squared respectively. We benchmark these videos on object detection with special consideration given to small-object mean average precision.

  • Preprint Article
  • 10.1002/essoar.10501881.1
Developing a CNN for automated detection of Carolina bays from publicly available LiDAR data
  • Jan 18, 2020
  • Mark Lundine + 1 more

For over a century, the enigmatic Carolina bays have captivated geologists and spurred contentious debate on their origin. These circular to ovate and shallow (median diameter of 222 m, median depth of 2.17 m, median area of 26,249 sq. m) depressions span the Atlantic Coastal Plain (ACP) from northern Florida to southern New Jersey, with total counts ranging between 10,000 and 500,000. Using 1 meter gridded, 1.7 km by 1.7 km LiDAR digital elevation models (DEMs) of Delaware as training images, a convolutional neural network (CNN) was trained to detect Carolina bays. With such a large population size and with such uncertainty around the actual population size, mapping the Carolina bays is a problem that requires an automated detection scheme. Manual detection of bays from LiDAR across the entire Atlantic Coastal Plain would be extremely time intensive and prone to human annotation errors. Using Faster R-CNN within the TensorFlow Python library, a network was trained on 978 LiDAR images for 24 hours (42,450 iterations) on an Intel Core i7-4790K CPU at 4.00 GHz. This network automatically detects bays from LiDAR images with a bounding box and a confidence level. These bounding boxes can then be used to subset and then analyze regions of the DEM for statistics on the bays’ three-dimensional shape. Extending this algorithm to DEMs from other areas of the ACP will provide a better understanding of the bays’ geographic distribution as well as any differences in morphology between different geographic regions. This method for detecting geomorphic features is a highly efficient process that will provide better means for mapping various types of abundant geomorphic features in the future.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/icnsc.2015.7116081
Visual based fall detection with reduced complexity horprasert segmentation using superpixel
  • Apr 1, 2015
  • Chin-Jou Chong + 4 more

Apart from wearable sensors and floor sensors, remote fall detection systems can be realized using camera sensors and computer visions methods and this visual based system is accurate, non-intrusive and capable to perform post fall event analysis with the recorded video. To implement visual based fall detection, the foreground segmentation process is crucial in order to provide the right foreground region with useful features for fall detection and analysis. However, in an indoor environment, change of global illumination, shadow occurrence and colour camouflage tend to occur and affect the performance of foreground extraction. Existing techniques attempted to overcome these issues are compromised with higher computational complexity and longer processing speed. Thus, an approach of using Horprasert algorithm incorporating superpixel clustering is proposed to perform background modeling and background segmentation. The foreground extracted by the proposed method is then tested against two different fall detection methods, using bounding box and motion quantification with approximated ellipse. The result has shown reduction in complexity and improvement in processing speed, without much disparity compared to the original Horprasert segmentation.

  • Research Article
  • Cite Count Icon 32
  • 10.1001/jamanetworkopen.2022.26265
Development and Validation of a Model for Laparoscopic Colorectal Surgical Instrument Recognition Using Convolutional Neural Network–Based Instance Segmentation and Videos of Laparoscopic Procedures
  • Aug 19, 2022
  • JAMA Network Open
  • Daichi Kitaguchi + 8 more

Deep learning-based automatic surgical instrument recognition is an indispensable technology for surgical research and development. However, pixel-level recognition with high accuracy is required to make it suitable for surgical automation. To develop a deep learning model that can simultaneously recognize 8 types of surgical instruments frequently used in laparoscopic colorectal operations and evaluate its recognition performance. This quality improvement study was conducted at a single institution with a multi-institutional data set. Laparoscopic colorectal surgical videos recorded between April 1, 2009, and December 31, 2021, were included in the video data set. Deep learning-based instance segmentation, an image recognition approach that recognizes each object individually and pixel by pixel instead of roughly enclosing with a bounding box, was performed for 8 types of surgical instruments. Average precision, calculated from the area under the precision-recall curve, was used as an evaluation metric. The average precision represents the number of instances of true-positive, false-positive, and false-negative results, and the mean average precision value for 8 types of surgical instruments was calculated. Five-fold cross-validation was used as the validation method. The annotation data set was split into 5 segments, of which 4 were used for training and the remainder for validation. The data set was split at the per-case level instead of the per-frame level; thus, the images extracted from an intraoperative video in the training set never appeared in the validation set. Validation was performed for all 5 validation sets, and the average mean average precision was calculated. In total, 337 laparoscopic colorectal surgical videos were used. Pixel-by-pixel annotation was manually performed for 81 760 labels on 38 628 static images, constituting the annotation data set. The mean average precisions of the instance segmentation for surgical instruments were 90.9% for 3 instruments, 90.3% for 4 instruments, 91.6% for 6 instruments, and 91.8% for 8 instruments. A deep learning-based instance segmentation model that simultaneously recognizes 8 types of surgical instruments with high accuracy was successfully developed. The accuracy was maintained even when the number of types of surgical instruments increased. This model can be applied to surgical innovations, such as intraoperative navigation and surgical automation.

  • Research Article
  • Cite Count Icon 7
  • 10.1109/tmi.2024.3381209
Instrument-Tissue Interaction Detection Framework for Surgical Video Understanding.
  • Aug 1, 2024
  • IEEE transactions on medical imaging
  • Wenjun Lin + 7 more

Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra- and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as 〈 instrument class, instrument bounding box, tissue class, tissue bounding box, action class 〉 quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets.

  • Conference Article
  • Cite Count Icon 2
  • 10.2316/p.2012.778-042
Optimization of Color-based Foreground / Background Segmentation for Outdoor Scenes
  • Jan 1, 2012
  • Louis St-Laurent + 2 more

Performing foreground / background segmentation in outdoor scene is very challenging. Starting from a stateof-the-art color-based approach [8], we propose three improvements: the use of the YCoCg color space, a spherical association volume, and a cast shadows management approach. Using image sequences with ground truth at pixel-level, we quantitatively measured the performances of the proposed algorithm and demonstrated that it leads to a reduced processing time with improved detection accuracy. We also introduce a new public dataset of outdoor videos with ground truth.

  • Research Article
  • Cite Count Icon 17
  • 10.1145/3478513.3480520
Foids
  • Dec 1, 2021
  • ACM Transactions on Graphics
  • Yuko Ishiwaka + 6 more

We present a bio-inspired fish simulation platform, which we call "Foids", to generate realistic synthetic datasets for an use in computer vision algorithm training. This is a first-of-its-kind synthetic dataset platform for fish, which generates all the 3D scenes just with a simulation. One of the major challenges in deep learning based computer vision is the preparation of the annotated dataset. It is already hard to collect a good quality video dataset with enough variations; moreover, it is a painful process to annotate a sufficiently large video dataset frame by frame. This is especially true when it comes to a fish dataset because it is difficult to set up a camera underwater and the number of fish (target objects) in the scene can range up to 30,000 in a fish cage on a fish farm. All of these fish need to be annotated with labels such as a bounding box or silhouette, which can take hours to complete manually, even for only a few minutes of video. We solve this challenge by introducing a realistic synthetic dataset generation platform that incorporates details of biology and ecology studied in the aquaculture field. Because it is a simulated scene, it is easy to generate the scene data with annotation labels from the 3D mesh geometry data and transformation matrix. To this end, we develop an automated fish counting system utilizing the part of synthetic dataset that shows comparable counting accuracy to human eyes, which reduces the time compared to the manual process, and reduces physical injuries sustained by the fish.

  • PDF Download Icon
  • Research Article
  • 10.21428/594757db.f0bc10fd
Quantifying Path Smoothness in Video Object Tracking by Detection
  • Jun 5, 2023
  • Proceedings of the Canadian Conference on Artificial Intelligence
  • Mohammed Gasmallah + 3 more

Object detection and tracking are important areas of research in computer vision. Computer vision solutions to object detection are typically single-frame solutions. To perform tracking by detection, these solutions typically do object detection on a perframe basis, thus losing any temporal information from previous frames. Many multiobject tracking solutions report the average precision performance on video datasets, but they do not evaluate the temporal qualities of these solutions. In video, not only the detection of objects is important but the temporal motion attributes of an object’s path, such as its velocity, acceleration, and jerk, are important as well. Many implementations of Object Tracking by Detection systems have run into the problem of motion smoothing for bounding box paths. This paper focuses on quantifying the smoothness of detected object paths within some temporal window. We propose using two smoothness metrics from the field of biokinematics and adapt them for use with detections. Finally, using these metrics, we evaluate the ground truth and two popular object detectors, at the time of experimentation (YOLOv3 and Retinanet), on the entire MOT17 dataset. The results show that the metrics are useful in determining object smoothness, and provide us with an additional approach to evaluate an algorithm’s performance in object tracking. The experiments also demonstrate that YOLOv3 produces smoother bounding boxes than Retinanet. All supplemental graphs and data are shown in our appendix

  • Conference Article
  • Cite Count Icon 37
  • 10.1109/cvpr.2015.7299187
Fine-grained classification of pedestrians in video: Benchmark and state of the art
  • Jun 1, 2015
  • David Hall + 1 more

A video dataset that is designed to study fine-grained categorisation of pedestrians is introduced. Pedestrians were recorded “in-the-wild” from a moving vehicle. Annotations include bounding boxes, tracks, 14 keypoints with occlusion information and the fine-grained categories of age (5 classes), sex (2 classes), weight (3 classes) and clothing style (4 classes). There are a total of 27,454 bounding box and pose labels across 4222 tracks. This dataset is designed to train and test algorithms for fine-grained categorisation of people; it is also useful for benchmarking tracking, detection and pose estimation of pedestrians. State-of-the-art algorithms for fine-grained classification and pose estimation were tested using the dataset and the results are reported as a useful performance baseline.

  • Research Article
  • Cite Count Icon 6
  • 10.1007/s11760-018-1242-8
Saliency detection in video sequences using perceivable change encoded local pattern
  • Jan 22, 2018
  • Signal, Image and Video Processing
  • K L Chan

The detection of salient objects in video sequence is an active computer vision research topic. One approach is to perform joint segmentation of objects and background. The background scene is learned and modeled. A pixel is classified as salient if its features do not match with the background model. The segmentation process faces many difficulties when the video sequence is captured under various dynamic circumstances. To tackle these challenges, we propose a novel local ternary pattern for background modeling. The features derived from the local pattern are robust to random noise, scale transform of intensity and rotational transform. We also propose a novel scheme for matching a pixel with the background model within a spatiotemporal domain. Furthermore, we devise two feedback mechanisms for maintaining the quality of the result over a long video. First, the background model is updated immediately based on the background subtraction result. Second, the detected object is enhanced by adjustment of the segmentation conditions in proximity via a propagation scheme. We compare our method with state-of-the-art background subtraction algorithms using various video datasets.

  • Conference Article
  • Cite Count Icon 2
  • 10.23919/mva.2017.7986912
Saliency/non-saliency segregation in video sequences using perception-based local ternary pattern features
  • May 1, 2017
  • K L Chan

The detection of salient objects in video sequence is an active research area of computer vision. One approach is to perform joint segmentation of objects and background in each image frame of the video. The background scene is learned and modeled. Each pixel is classified as background if it matches the background model. Otherwise the pixel belongs to a salient object. The segregation method faces many difficulties when the video sequence is captured under various dynamic circumstances. To tackle these challenges, we propose a novel perception-based local ternary pattern for background modeling. The local pattern is fast to compute and is insensitive to random noise, scale transform of intensity. The pattern feature is also invariant to rotational transform. We also propose a novel scheme for matching a pixel with the background model within a spatio-temporal domain. Furthermore, we devise two feedback mechanisms for maintaining the quality of the result over a long video. First, the background model is updated immediately based on the background subtraction result. Second, the detected object is enhanced by adjustment of the segmentation conditions in proximity via a propagation scheme. We compare our method with state-of-the-art background/foreground segregation algorithms using various video datasets.

  • Supplementary Content
  • Cite Count Icon 1
  • 10.4225/28/5b0c8d84e69b2
The vulnerability of microhylid frogs, Cophixalus spp., to climate change in the Australian Wet Tropics
  • Jan 1, 2018
  • Andrés Merino‐Viteri

The vulnerability of microhylid frogs, Cophixalus spp., to climate change in the Australian Wet Tropics

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 53
  • 10.3389/fmars.2022.944582
Accelerating Species Recognition and Labelling of Fish From Underwater Video With Machine-Assisted Deep Learning
  • Aug 2, 2022
  • Frontiers in Marine Science
  • Daniel Marrable + 6 more

Machine-assisted object detection and classification of fish species from Baited Remote Underwater Video Station (BRUVS) surveys using deep learning algorithms presents an opportunity for optimising analysis time and rapid reporting of marine ecosystem statuses. Training object detection algorithms for BRUVS analysis presents significant challenges: the model requires training datasets with bounding boxes already applied identifying the location of all fish individuals in a scene, and it requires training datasets identifying species with labels. In both cases, substantial volumes of data are required and this is currently a manual, labour-intensive process, resulting in a paucity of the labelled data currently required for training object detection models for species detection. Here, we present a “machine-assisted” approach for i) a generalised model to automate the application of bounding boxes to any underwater environment containing fish and ii) fish detection and classification to species identification level, up to 12 target species. A catch-all “fish” classification is applied to fish individuals that remain unidentified due to a lack of available training and validation data. Machine-assisted bounding box annotation was shown to detect and label fish on out-of-sample datasets with a recall between 0.70 and 0.89 and automated labelling of 12 targeted species with an F1 score of 0.79. On average, 12% of fish were given a bounding box with species labels and 88% of fish were located and given a fish label and identified for manual labelling. Taking a combined, machine-assisted approach presents a significant advancement towards the applied use of deep learning for fish species detection in fish analysis and workflows and has potential for future fish ecologist uptake if integrated into video analysis software. Manual labelling and classification effort is still required, and a community effort to address the limitation presented by a severe paucity of training data would improve automation accuracy and encourage increased uptake.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant