Exploring deep neural networks for real-world ship detection using scaled model images and chroma key technology
Abstract This paper presents the development and evaluation of a deep neural network model for the detection of naval surface vessel using laboratory-generated datasets. By employing chroma-key technology, images of a scale model naval vessel were superimposed onto realistic maritime backgrounds to create a diverse training dataset. Fine-tuned with these datasets and evaluated using the YOLOv8 framework, the model achieved high precision and recall in identifying the naval surface vessel despite data limitations. This zero-shot learning approach, validated through extensive testing, supports visual navigation and target identification in GPS/RF-denied environments, advancing autonomous maritime operations and aligning with the United States Navy strategy to leverage AI/ML for military enhancement.
- Research Article
69
- 10.1037//0096-1523.24.3.745
- Jan 1, 1998
- Journal of experimental psychology. Human perception and performance
A target identification paradigm was used to study cross-modal spatial cuing effects on auditory and visual target identification. Each trial consisted of an auditory or visual spatial cue followed by an auditory or visual target. The cue and target could be either of the same modality (within-modality conditions) or of different modalities (between-modalities conditions). In 3 experiments, a larger cue validity effect was apparent on within-modality trials than on between-modalities trials. In addition, the likelihood of identifying a significant cross-modal cuing effect was observed to depend on the predictability of the cue-target relation. These effects are interpreted as evidence (a) of separate auditory and visual spatial attention mechanisms and (b) that target identification may be influenced by spatial cues of another modality but that this effect is primarily dependent on the engagement of endogenous attentional mechanisms.
- Research Article
36
- 10.3758/app.72.7.1938
- Oct 1, 2010
- Attention, Perception & Psychophysics
In the present study, participants identified the location of a visual target presented in a rapidly masked, changing sequence of visual distractors. In Experiment 1, we examined performance when a high tone, embedded in a sequence of low tones, was presented in synchrony with the visual target and observed that the high tone improved visual target identification, relative to a condition in which a low tone was synchronized with the visual target, thus replicating Vroomen and de Gelder's (2000, Experiment 1) findings. In subsequent experiments, we presented a single visual, auditory, vibrotactile, or combined audiotactile cue with the visual target and found similar improvements in participants' performance regardless of cue type. These results suggest that crossmodal perceptual organization may account for only a part of the improvement in participants' visual target identification performance reported in Vroomen and de Gelder's original study. Moreover, in contrast with many previous crossmodal cuing studies, our results also suggest that visual cues can enhance visual target identification performance. Alternative accounts for these results are discussed in terms of enhanced saliency, the presence of a temporal marker, and attentional capture by oddball stimuli as potential explanations for the observed performance benefits.
- Research Article
13
- 10.1007/s00345-020-03452-0
- Sep 22, 2020
- World Journal of Urology
Introduction and objectiveThermal injuries associated with Holmium laser lithotripsy of the urinary tract are an underestimated problem in stone therapy. Surgical precision relies exclusively on visual target identification when applying laser energy for stone disintegration. This study evaluates a laser system that enables target identification automatically during bladder stone lithotripsy, URS, and PCNL in a porcine animal model.MethodsHolmium laser lithotripsy was performed on two domestic pigs by an experienced endourology surgeon in vivo. Human stone fragments (4–6 mm) were inserted in both ureters, renal pelvises, and bladders. Ho:YAG laser lithotripsy was conducted as a two-arm comparison study, evaluating the target identification system against common lithotripsy. We assessed the ureters’ lesions according to PULS and the other locations descriptively. Post-mortem nephroureterectomy and cystectomy specimens were examined by a pathologist.ResultsThe sufficient disintegration of stone samples was achieved in both setups. Endoscopic examination revealed numerous lesions in the urinary tract after the commercial Holmium laser system. The extent of lesions with the feedback system was semi-quantitatively and qualitatively lower. The energy applied was significantly less, with a mean reduction of more than 30% (URS 27.1%, PCNL 52.2%, bladder stone lithotripsy 17.1%). Pathology examination revealed only superficial lesions in both animals. There was no evidence of organ perforation in either study arm.ConclusionsOur study provides proof-of-concept for a laser system enabling automatic real-time target identification during lithotripsy on human urinary stones. Further studies in humans are necessary, and to objectively quantify this new system’s advantages, investigations involving a large number of cases are mandatory.
- Conference Article
2
- 10.1117/12.2278864
- Oct 5, 2017
The need for capabilities of automated visual content analysis has substantially increased due to presence of large number of images captured by surveillance cameras. With a focus on development of practical methods for extracting effective visual data representations, deep neural network based representations have received great attention due to their success in visual categorization of generic images. For fine-grained image categorization, a closely related yet a more challenging research problem compared to generic image categorization due to high visual similarities within subgroups, diverse applications were developed such as classifying images of vehicles, birds, food and plants. Here, we propose the use of deep neural network based representations for categorizing and identifying marine vessels for defense and security applications. First, we gather a large number of marine vessel images via online sources grouping them into four coarse categories; naval, civil, commercial and service vessels. Next, we subgroup naval vessels into fine categories such as corvettes, frigates and submarines. For distinguishing images, we extract state-of-the-art deep visual representations and train support-vector-machines. Furthermore, we fine tune deep representations for marine vessel images. Experiments address two scenarios, classification and verification of naval marine vessels. Classification experiment aims coarse categorization, as well as learning models of fine categories. Verification experiment embroils identification of specific naval vessels by revealing if a pair of images belongs to identical marine vessels by the help of learnt deep representations. Obtaining promising performance, we believe these presented capabilities would be essential components of future coastal and on-board surveillance systems.
- Conference Article
7
- 10.1109/icip.2018.8451482
- Oct 1, 2018
Retinal blood vessel detection is a crucial step in automatic retinal image analysis. Recently, deep neural networks have significantly advanced the state of the art for retinal blood vessel detection in color fundus (CF) images. Thus far, similar gains have not been seen in fluorescein angiography (FA) because the FA modality is entirely different from CF and annotated training data has not been available for FA imagery. We address retinal vessel detection in wide-field FA images with generative adversarial networks (GAN) via a novel approach for generating training data. Using a publicly available dataset that contains concurrently acquired pairs of CF and fundus FA images, vessel maps are detected in CF images via a pre-trained neural network and registered with fundus FA images via parametric chamfer matching to a preliminary FA vessel detection map. The co-aligned pairs of vessel maps (detected from CF images) and fundus FA images are used as ground truth labeled data for de novo training of a deep neural network for FA vessel detection. Specifically, we utilize adversarial learning to train a GAN where the generator learns to map FA images to binary vessel maps and the discriminator attempts to distinguish generated vs. ground-truth vessel maps. We highlight several important considerations for the proposed data generation methodology. The proposed method is validated on VAMpIRE dataset that contains high-resolution wide-field FA images and manual annotation of vessel segments. Experimental results demonstrate that the proposed method achieves an estimated ROC AUC of 0.9758.
- Research Article
37
- 10.1016/j.patcog.2021.107903
- Feb 20, 2021
- Pattern Recognition
Fooling deep neural detection networks with adaptive object-oriented adversarial perturbation
- Research Article
17
- 10.1080/00140130903483713
- Mar 22, 2010
- Ergonomics
A model of head movement contribution for gaze transitions
- Research Article
15
- 10.1007/s00221-012-3153-1
- Jul 4, 2012
- Experimental Brain Research
When identifying a rapidly masked visual target display in a stream of visual distractor displays, a high-frequency tone (presented in synchrony with the target display) in a stream of low-tone distractors results in better performance than when the same low tone accompanies each visual display (Ngo and Spence in Atten Percept Psychophys 72:1938-1947, 2010; Vroomen and de Gelder in J Exp Psychol Hum 26:1583-1590, 2000). In the present study, we tested three oddball conditions: a louder tone presented amongst quieter tones, a quieter tone presented amongst louder tones, and the absence of a tone, within an otherwise identical tone sequence. Across three experiments, all three oddball conditions resulted in the crossmodal facilitation of participants' visual target identification performance. These results therefore suggest that salient oddball stimuli in the form of deviating tones, when synchronized with the target, may be sufficient to capture participants' attention and facilitate visual target identification. The fact that the absence of a sound in an otherwise-regular sequence of tones also facilitated performance suggests that multisensory integration cannot provide an adequate account for the 'freezing' effect. Instead, an attentional capture account is proposed to account for the benefits of oddball cuing in Vroomen and de Gelder's task.
- Research Article
431
- 10.1016/j.cogsys.2019.12.005
- Jan 21, 2020
- Cognitive Systems Research
FNDNet – A deep convolutional neural network for fake news detection
- Conference Article
18
- 10.1109/icassp.2017.7952553
- Mar 1, 2017
This paper proposes an effective end-to-end face detection and recognition framework based on deep convolutional neural networks for home service robots. We combine the state-of-the-art region proposal based deep detection network with the deep face embedding network into an end-to-end system, so that the detection and recognition networks can share the same deep convolutional layers, enabling significant reduction of computation through sharing convolutional features. The detection network is robust to large occlusion, and scale, pose, and lighting variations. The recognition network does not require explicit face alignment, which enables an effective training strategy to generate a unified network. A practical robot system is also developed based on the proposed framework, where the system automatically asks for a minimum level of human supervision when needed, and no complicated region-level face annotation is required. Experiments are conducted over WIDER and LFW benchmarks, as well as a personalized dataset collected from an office setting, which demonstrate state-of-the-art performance of our system.
- Research Article
27
- 10.1109/21.7488
- May 1, 1988
- IEEE Transactions on Systems, Man, and Cybernetics
Results of two experiments in dynamic task allocation are discussed. Subjects performed two concurrent computer-based tasks: visual target identification and subcritical compensatory tracking. Target identification could be allocated dynamically to human or computer aid. Three aiding conditions were investigated: no aid, manual aid (with subjects making the allocation decision), and automatic aid (with allocation decisions based on models of human performance). The results indicated that: (1) overall performance was better with the aid available; (2) need for the aid depended on both current and previous task states; (3) unaided performance was benefited by having an aid available, but only if subjects were in charge of task allocation; and (4) although overall performance was better with he automatic aid, subjects preferred the manual aid. The implications of these and other results are discussed.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
- Research Article
23
- 10.7205/milmed-d-15-00576
- Jan 1, 2017
- Military Medicine
To compare visual performance, marksmanship performance, and threshold target identification following wavefront-guided (WFG) versus wavefront-optimized (WFO) photorefractive keratectomy (PRK). In this prospective, randomized clinical trial, active duty U.S. military Soldiers, age 21 or over, electing to undergo PRK were randomized to undergo WFG (n = 27) or WFO (n = 27) PRK for myopia or myopic astigmatism. Binocular visual performance was assessed preoperatively and 1, 3, and 6 months postoperatively: Super Vision Test high contrast, Super Vision Test contrast sensitivity (CS), and 25% contrast acuity with night vision goggle filter. CS function was generated testing at five spatial frequencies. Marksmanship performance in low light conditions was evaluated in a firing tunnel. Target detection and identification performance was tested for probability of identification of varying target sets and probability of detection of humans in cluttered environments. Visual performance, CS function, marksmanship, and threshold target identification demonstrated no statistically significant differences over time between the two treatments. Exploratory regression analysis of firing range tasks at 6 months showed no significant differences or correlations between procedures. Regression analysis of vehicle and handheld probability of identification showed a significant association with pretreatment performance. Both WFG and WFO PRK results translate to excellent and comparable visual and military performance.
- Research Article
24
- 10.3389/frobt.2020.00126
- Sep 29, 2020
- Frontiers in Robotics and AI
Environments in which Global Positioning Systems (GPS), or more generally Global Navigation Satellite System (GNSS), signals are denied or degraded pose problems for the guidance, navigation, and control of autonomous systems. This can make operating in hostile GNSS-Impaired environments, such as indoors, or in urban and natural canyons, impossible or extremely difficult. Pixel Processor Array (PPA) cameras—in conjunction with other on-board sensors—can be used to address this problem, aiding in tracking, localization, and control. In this paper we demonstrate the use of a PPA device—the SCAMP vision chip—combining perception and compute capabilities on the same device for aiding in real-time navigation and control of aerial robots. A PPA consists of an array of Processing Elements (PEs), each of which features light capture, processing, and storage capabilities. This allows various image processing tasks to be efficiently performed directly on the sensor itself. Within this paper we demonstrate visual odometry and target identification running concurrently on-board a single PPA vision chip at a combined frequency in the region of 400 Hz. Results from outdoor multirotor test flights are given along with comparisons against baseline GPS results. The SCAMP PPA's High Dynamic Range (HDR) and ability to run multiple algorithms at adaptive rates makes the sensor well suited for addressing outdoor flight of small UAS in GNSS challenging or denied environments. HDR allows operation to continue during the transition from indoor to outdoor environments, and in other situations where there are significant variations in light levels. Additionally, the PPA only needs to output specific information such as the optic flow and target position, rather than having to output entire images. This significantly reduces the bandwidth required for communication between the sensor and on-board flight computer, enabling high frame rate, low power operation.
- Research Article
7
- 10.1177/1541931218621432
- Sep 1, 2018
- Proceedings of the Human Factors and Ergonomics Society Annual Meeting
Although several studies have assessed the effect of business logo sign format on driver visual attention and performance, some concern has been expressed that findings may not be generalizable to other signage configurations. We conducted a driving simulation study to assess the effect of distance guide sign format on visual attention allocation, target detection accuracy, and driving performance considering driver demographics. Results revealed distance guide sign format, including random or distance-ordered presentation of destinations, to have no impact on driver visual attention, target identification, and vehicle control. However, elderly drivers had difficulty in identifying targets when destinations were presented in random order. In addition, elderly drivers exhibited conservative responses (i.e., reduced off-road visual attention and greater speed reductions) as compared to other age groups when exposed to distance guide signs. Findings support design guidance for on-road signage to account for driver demographics.
- Research Article
37
- 10.1177/0018720812446623
- May 18, 2012
- Human Factors: The Journal of the Human Factors and Ergonomics Society
In the present study, we sought to investigate whether auditory and tactile cuing could be used to facilitate a complex, real-world air traffic management scenario. Auditory and tactile cuing provides an effective means of improving both the speed and accuracy of participants' performance in a variety of laboratory-based visual target detection and identification tasks. A low-fidelity air traffic simulation task was used in which participants monitored and controlled aircraft.The participants had to ensure that the aircraft landed or exited at the correct altitude, speed, and direction and that they maintained a safe separation from all other aircraft and boundaries. The performance measures recorded included en route time, handoff delay, and conflict resolution delay (the performance measure of interest). In a baseline condition, the aircraft in conflict was highlighted in red (visual cue), and in the experimental conditions, this standard visual cue was accompanied by a simultaneously presented auditory, vibrotactile, or audiotactile cue. Participants responded significantly more rapidly, but no less accurately, to conflicts when presented with an additional auditory or audiotactile cue than with either a vibrotactile or visual cue alone. Auditory and audiotactile cues have the potential for improving operator performance by reducing the time it takes to detect and respond to potential visual target events. These results have important implications for the design and use of multisensory cues in air traffic management.