Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

De-Decay: Defusing Computer Vision Model Degradation through Scalable and Actionable Human-Data Alignment

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Computer Vision (CV) models can become outdated after deployment as real-world data evolves, requiring intensive attention from AI engineers to address degraded performance through tasks like data relabeling to update models with new human perceptions. Interactive human-in-the-loop systems have considerable potential to enhance model-steering practices. However, such workflows reveal two challenges: (1) scalability, where labor demands increase with data size, and (2) actionability, where human insights do not readily transform into model revisions. Based on our formative study (S1) on the current challenges faced by CV professionals, we developed De-Decay, an end-to-end Human-Data Alignment system offering scalable label-less assessment and actionable insight transformation. This enables engineers to investigate degradation and auto-retrain models with AI support, such as image clustering and regeneration. Our summative study (S2) showed that De-Decay helped engineers effectively identify and address CV degradation. We discuss how future research can enhance scalability and actionability in AI evaluation systems for aligning AI behaviors with human mental models.

Similar Papers
  • Research Article
  • Cite Count Icon 4
  • 10.1109/tpami.2003.1206508
Guest editors' introduction to the special section on graphical models in computer vision
  • Jul 1, 2003
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • J.M Rehg + 3 more

THE last 10 years have witnessed rapid growth in the popularity of graphical models, most notably Bayesian networks, as a tool for representing, learning, and computing complex probability distributions. Graphical models provide an explicit representation of the statistical dependencies between the components of a complex probability model, effectively marrying probability theory and graph theory. As Jordan puts it in [2], graphical models are “a natural tool for dealing with two problems that occur throughout applied mathematics and engineering—uncertainty and complexity—and, in particular, they are playing an increasingly important role in the design and analysis of machine learning algorithms.” Graphical models provide powerful computational support for the Bayesian approach to computer vision, which has become a standard framework for addressing vision problems. Many familiar tools from the vision literature, such as Markov random fields, hidden Markov models, and the Kalman filter, are instances of graphical models. More importantly, the graphical models formalism makes it possible to generalize these tools and develop novel statistical representations and associated algorithms for inference and learning. The history of graphical models in computer vision follows closely that of graphical models in general. Research by Pearl [3] and Lauritzen [4] in the late 1980s played a seminal role in introducing this formalism to areas of AI and statistical learning. Not long after, the formalism spread to fields such as statistics, systems engineering, information theory, pattern recognition, and, among others, computer vision. One of the earliest occurrences of graphical models in the vision literature was a paper by Binford et al. [1]. The paper described the use of Bayesian inference in a hierarchical probability model to match 3D object models to groupings of curves in a single image. The following year marked the publication of Pearl’s influential book [3] on graphical models. Since then, many technical papers have been published in IEEE journals and conference proceedings that address different aspects and applications of graphical models in computer vision. Our goal in organizing this special section was to demonstrate the breadth of applicability of the graphical models formalism to vision problems. Our call for papers in February 2002 produced 16 submissions. After a careful review process, we selected six papers for publication, including five regular papers, and one short paper. These papers reflect the state-of-the-art in the use of graphical models in vision problems that range from low-level image understanding to high-level scene interpretation. We believe these papers will appeal both to vision researchers who are actively engaged in the use of graphical models and machine learning researchers looking for a challenging application domain. The first paper in this section is “Stereo Matching Using Belief Propagation” by J. Sun, N.-N. Zheng, and H.-Y. Shum. The authors describe a new stereo algorithm based on loopy belief propagation, a powerful inference technique for complex graphical models in which exact inference is intractable. They formulate the dense stereo matching problem as MAP estimation on coupled Markov random fields and obtain promising results on standard test data sets. One of the benefits of this formulation, as the authors demonstrate, is the ease with which it can be extended to handle multiview stereo matching. In their paper “Statistical Cue Integration of DAG Deformable Models” S.K. Goldenstein, C. Vogler, and D. Metaxas describe a scheme for combining different sources of information into estimates of the parameters of a deformable model. They use a DAG representation of the interdependencies between the nodes in a deformable model. This framework supports the efficient integration of information from edges and other cues using the machinery of affine arithmetic and the propagation of uncertainties. They present experimental results for a face tracking application. Y. Song, L. Goncalves, and P. Perona describe, in their paper “Unsupervised Learning of Human Motion,” a method for learning probabilistic models of human motion from video sequences in cluttered scenes. Two key advantages of their method are its unsupervised nature, which can mitigate the need for tedious hand labeling of data, and the utilization of graphical model constraints to reduce the search space when fitting a human figure model. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 7, JULY 2003 785

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 18
  • 10.3390/su15086438
Pavement Distress Identification Based on Computer Vision and Controller Area Network (CAN) Sensor Models
  • Apr 10, 2023
  • Sustainability
  • Cuthbert Ruseruka + 5 more

Recent technological developments have attracted the use of machine learning technologies and sensors in various pavement maintenance and rehabilitation studies. To avoid excessive road damages, which cause high road maintenance costs, reduced mobility, vehicle damages, and safety concerns, the periodic maintenance of roads is necessary. As part of maintenance works, road pavement conditions should be monitored continuously. This monitoring is possible using modern distress detection methods that are simple to use, comparatively cheap, less labor-intensive, faster, safer, and able to provide data on a real-time basis. This paper proposed and developed two models: computer vision and sensor-based. The computer vision model was developed using the You Only Look Once (YOLOv5) algorithm for detecting and classifying pavement distresses into nine classes. The sensor-based model combined eight Controller Area Network (CAN) bus sensors available in most new vehicles to predict pavement distress. This research employed an extreme gradient boosting model (XGBoost) to train the sensor-based model. The results showed that the model achieved 98.42% and 97.99% area under the curve (AUC) metrics for training and validation datasets, respectively. The computer vision model attained an accuracy of 81.28% and an F1-score of 76.40%, which agree with past studies. The results indicated that both computer vision and sensor-based models proved highly efficient in predicting pavement distress and can be used to complement each other. Overall, computer vision and sensor-based tools provide cheap and practical road condition monitoring compared to traditional manual instruments.

  • Conference Article
  • 10.1145/3631908.3631927
A Framework for Tile Processing on Edge Servers for Roadside Traffic Surveillance
  • Oct 19, 2023
  • Saumya Jaipuria + 2 more

Roadside traffic monitoring is increasingly performed by deploying roadside high-resolution video cameras and then running Computer Vision (CV) models on the video data. Since computer vision models are compute-intensive as they utilize Deep Neural Networks (DNNs), the data is usually sent to one or more edge servers located adjacent to mobile base stations. Recent techniques propose running CV models on tiles of videos separately to detect and track small objects. Several CV models exist, each with different requirements of compute and memory. Since more compute and memory-intensive CV models provide higher accuracy, a key challenge of such techniques is to determine which vision model should be used on which tile. This becomes even more challenging if multiple videos are processed by the same edge server. In this paper, we first formulate this problem of model selection on edge devices as an Integer Linear Programming (ILP) problem, and then propose a heuristic to solve it. Our experiments show that it is quite effective in practice.

  • Research Article
  • Cite Count Icon 1
  • 10.4103/trp.trp_7_23
COMPUTER VISION AND MACHINE LEARNING FOR ASSESSING THYROID NODULE COMPOSITION: ADVANCING TOWARDS ML-THYROID IMAGING REPORTING AND DATA SYSTEM DEVELOPMENT
  • Oct 16, 2023
  • Thyroid Research and Practice
  • Om J Lakhani + 2 more

Background: Our study aims to develop a new algorithm that utilizes computer vision to accurately assess the composition of thyroid nodules and calculate the Thyroid Imaging Reporting & Data System (TIRADS) score based on uploaded nodule images. This prospective observational study evaluates the performance of our computer vision model in correctly classifying the composition of thyroid nodules. Methods: We developed a computer vision and machine learning model using the nocode AI tool “Levity” to accurately classify the composition of thyroid nodules as “Solid”, “Cystic”, “Mixed solid-cystic”, or “Spongiform”. The model was trained on 200 thyroid nodule images, labeled by an experienced endocrinologist. We tested the model on 50 additional images using an easy-to-use chatbot tool and assessed the clinician’s agreement with the diagnosis. The tool can be accessed for public use using the following link: https://tinyurl.com/2pfehpq8. Statistical analysis using the Chi-square test, Pearson correlation coefficient and point-biserial correlation coefficient was performed using Python and R. p value of less than 0.05 was considered as statistically significant. Results: The computer model diagnosed 50 test images, with 34 solid, 5 mixed solid-cystic, 7 purely cystic, and 4 spongiform. Clinician agreement was 64% and partial agreement was 20%, resulting in 84% total agreement. Chi-squared test showed a statistically significant relationship between model diagnosis and clinician agreement (p value of 0.0000059403). Pearson’s correlation test showed a statistically significant correlation between the percentage confidence of the model and clinician agreement (p value of 0.034). The model was confident in 56% of cases, with 60.7% confidence when the clinician agreed and 57.9% for partial agreement. Using the point-biserial correlation, a statistically significant correlation was found between the percentage value given by the model and agreement of the clinician (p value of 0.00182). Using the ROC curve, an optimal cutoff of 52.28% was determined for rejecting the model diagnosis. AUC was 0.85, indicating good performance. Conclusions: In conclusion, our study successfully developed a computer vision and machine learning model using Levity to accurately classify thyroid nodule composition. Our model showed good performance with statistically significant results in clinician agreement, Chi-squared test, Pearson’s correlation test, and point-biserial correlation. The tool can be accessed for public use, and an optimal cutoff of 52.28% was determined for rejecting the model diagnosis.

  • Book Chapter
  • 10.5772/intechopen.1009359
Perspective Chapter: What Image AI is Close to Human Senses?
  • May 19, 2025
  • Hiroshi Omori

The development of Image AI has been remarkable. In addition to Computer Vision Models (CVMs) that use supervised learning on large-scale data such as ImageNet-21 K, CVMs that use self-supervised learning and Contrastive Language Image Pre-training (CLIP) have been developed recently. We have been researching human environmental perception using photos. We measured the visual similarity by having many participants manually classify similar photos. We had three photo sets: 100 garden landscapes, 242 cityscapes, and 200 student life photos. We investigated how closely 26 types of pre-trained CVMs matched the human sense. We also used SoftMax regression to select a synthetic CVM from these CVMs that best correlated with the visual similarity. The optimal CVM combinations varied significantly across photo sets. For garden landscapes, which only have garden photos, semantic segmentation and self-supervised CVM were effective, while for student life photos, which have a wide variety of photos, supervised CVM was effective. For cityscapes, which have intermediate variations, self-supervised CVM was effective. MDS was used to examine in detail how the optimal CVM was like human perception and how it differed. For garden landscapes, humans and the CVM agreed on the judgment of garden size but differed on the judgment of garden style. In the case of cityscapes and student life photos, the recognition of patterns in the photos was roughly consistent between humans and the CVM, but the CVM made more detailed classifications. It was also suggested that some of the differences between the two stems from human representation. Although some differences remain, we found that by combining CVMs effectively, it is possible to construct a CVM that is quite close to human senses.

  • Book Chapter
  • Cite Count Icon 245
  • 10.1007/bfb0028368
Markov random field models in computer vision
  • Jan 1, 1994
  • S Z Li

A variety of computer vision problems can be optimally posed as Bayesian labeling in which the solution of a problem is defined as the maximum a posteriori (MAP) probability estimate of the true labeling. The posterior probability is usually derived from a prior model and a likelihood model. The latter relates to how data is observed and is problem domain dependent. The former depends on how various prior constraints are expressed. Markov Random Field Models (MRF) theory is a tool to encode contextual constraints into the prior probability. This paper presents a unified approach for MRF modeling in low and high level computer vision. The unification is made possible due to a recent advance in MRF modeling for high level object recognition. Such unification provides a systematic approach for vision modeling based on sound mathematical principles.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/s0262-8856(97)00084-x
Towards a method for parametrizing models of early vision using psychophysical data
  • May 1, 1998
  • Image and Vision Computing
  • Ela Claridge + 1 more

Towards a method for parametrizing models of early vision using psychophysical data

  • Research Article
  • Cite Count Icon 100
  • 10.1016/j.cviu.2016.04.009
Bio-inspired computer vision: Towards a synergistic approach of artificial and biological vision
  • Apr 29, 2016
  • Computer Vision and Image Understanding
  • N V Kartheek Medathati + 3 more

Studies in biological vision have always been a great source of inspiration for design of computer vision algorithms. In the past, several successful methods were designed with varying degrees of correspondence with biological vision studies, ranging from purely functional inspiration to methods that utilise models that were primarily developed for explaining biological observations. Even though it seems well recognised that computational models of biological vision can help in design of computer vision algorithms, it is a non-trivial exercise for a computer vision researcher to mine relevant information from biological vision literature as very few studies in biology are organised at a task level. In this paper we aim to bridge this gap by providing a computer vision task centric presentation of models primarily originating in biological vision studies. Not only do we revisit some of the main features of biological vision and discuss the foundations of existing computational studies modelling biological vision, but also we consider three classical computer vision tasks from a biological perspective: image sensing, segmentation and optical flow. Using this task-centric approach, we discuss well-known biological functional principles and compare them with approaches taken by computer vision. Based on this comparative analysis of computer and biological vision, we present some recent models in biological vision and highlight a few models that we think are promising for future investigations in computer vision. To this extent, this paper provides new insights and a starting point for investigators interested in the design of biology-based computer vision algorithms and pave a way for much needed interaction between the two communities leading to the development of synergistic models of artificial and biological vision.

  • Research Article
  • 10.23939/cds2025.01.037
SELF-SUPERVISED VISION TRANSFORMERS FOR CROSS-MODAL LEARNING (REVIEW)
  • Jan 1, 2025
  • Computer Design Systems. Theory and Practice
  • Olena Stankevych + 1 more

Computer vision systems are increasingly expanding their application in visual data analysis. Model training methods are undergoing the greatest development and improvement as the results of this stage significantly affect the final classification of objects and the interpretation of input information. Typically, computer vision systems use convolutional neural networks for training (Convolution Neural Network, CNN). The disadvantages of such systems are significant limitations in cross-modal learning, multimodality implementation, labeling of large amounts of data, etc. One of the ways to overcome these problems is to use Vision Transformers (ViT), which, compared to classical CNNs, have higher performance due to reduced inductive biases and high parallel computing efficiency. Introducing Self-Supervised Learning (SSL) technologies can significantly reduce the dependence on manually labeled data, contributing to the formation of generalized representations of images. Cross-Modal Learning (CML) expands the possibilities of processing them by combining data of different types. The development of the new approach, combined with the capabilities of cross-modal learning and self-learning in ViT in a single architecture, will ensure adaptability, efficiency, and system scalability in various applications. The research aims to provide a detailed overview of ViTs, approaches to their architecture, and methods for improving their efficiency. The mathematical foundations of the key concepts of ViT, cross-modal learning and self-learning, the main modifications of ViT, and their integration with SSL and CML technologies are considered. A comparison of methods using characteristics, performance, and efficiency is provided. The key challenges and prospects facing researchers and developers while creating universal models in computer vision are outlined. ViTs change computer vision by capturing global dependencies on images. Despite some challenges, ViTs provide excellent scalability and performance for large datasets. The active search for methods to overcome their limitations makes ViTs a key tool for improving image classification, object detection, and other computer vision tasks.

  • Front Matter
  • Cite Count Icon 2
  • 10.1109/tpami.2015.2434651
Guest Editors' Introduction: Special Section on Higher Order Graphical Models in Computer Vision.
  • Jul 1, 2015
  • IEEE transactions on pattern analysis and machine intelligence
  • Karteek Alahari + 4 more

The papers in this special section address the programs and services supported by graphical models in computer vision. This section explores the main challenges in this framework—modeling novel priors, learning, inference—and presents innovative solutions. The papers cover the aspects of modeling novel priors, inference algorithms and parameter learning methods in the context of higher order graphical models.

  • Research Article
  • Cite Count Icon 46
  • 10.1016/j.joule.2022.09.011
DeepSolar++: Understanding residential solar adoption trajectories with computer vision and technology diffusion models
  • Nov 1, 2022
  • Joule
  • Zhecheng Wang + 4 more

DeepSolar++: Understanding residential solar adoption trajectories with computer vision and technology diffusion models

  • Research Article
  • Cite Count Icon 13
  • 10.1177/1541931213601779
Mental Model Consensus and Shifts During Navigation System-Assisted Route Planning
  • Sep 1, 2017
  • Proceedings of the Human Factors and Ergonomics Society Annual Meeting
  • B S Perelman + 2 more

A major barrier to effective spatial decision-making in human-agent teams is that humans and algorithms use different mechanisms to solve spatial problems, frequently leading them to produce different solutions. Incongruity between algorithm-generated solutions and human spatial mental models results in higher workload in mixed-initiative systems, and potential breakdowns in trust and team situation awareness. Although these performance effects are well-understood, few methods exist for quantifying and comparing human spatial mental models and algorithm-generated solutions. To address these problems, 27 participants completed solutions to 5 spatial planning problems, before and after receiving assistance from 2 navigation algorithms. A novel path mapping and clustering approach provided a means of quantifying consensus in human mental models, and shifts in those mental models after viewing the algorithm-suggested routes. Human solutions clustered into a small number of shared mental models. Individual differences in trust in each algorithm predicted acceptance of that algorithm’s route.

  • Book Chapter
  • Cite Count Icon 15
  • 10.1007/978-3-031-25056-9_15
DEArt: Dataset of European Art
  • Jan 1, 2023
  • Artem Reshetnikov + 2 more

Large datasets that were made publicly available to the research community over the last 20 years have been a key enabling factor for the advances in deep learning algorithms for NLP or computer vision. These datasets are generally pairs of aligned image/manually annotated metadata, where images are photographs of everyday life. Scholarly and historical content, on the other hand, treat subjects that are not necessarily popular to a general audience, they may not always contain a large number of data points, and new data may be difficult or impossible to collect. Some exceptions do exist, for instance, scientific or health data, but this is not the case for cultural heritage (CH). The poor performance of the best models in computer vision - when tested over artworks - coupled with the lack of extensively annotated datasets for CH, and the fact that artwork images depict objects and actions not captured by photographs, indicate that a CH-specific dataset would be highly valuable for this community. We propose DEArt, at this point primarily an object detection and pose classification dataset meant to be a reference for paintings between the XIIth and the XVIIIth centuries. It contains more than 15000 images, about 80% non-iconic, aligned with manual annotations for the bounding boxes identifying all instances of 69 classes as well as 12 possible poses for boxes identifying human-like objects. Of these, more than 50 classes are CH-specific and thus do not appear in other datasets; these reflect imaginary beings, symbolic entities and other categories related to art. Additionally, existing datasets do not include pose annotations. Our results show that object detectors for the cultural heritage domain can achieve a level of precision comparable to state-of-art models for generic images via transfer learning.KeywordsDeep learningComputer visionCultural heritageObject detection

  • Research Article
  • Cite Count Icon 37
  • 10.1016/j.diii.2022.06.002
3D convolutional neural network model from contrast-enhanced CT to predict spread through air spaces in non-small cell lung cancer
  • Jun 27, 2022
  • Diagnostic and Interventional Imaging
  • Junli Tao + 7 more

3D convolutional neural network model from contrast-enhanced CT to predict spread through air spaces in non-small cell lung cancer

  • Single Book
  • Cite Count Icon 587
  • 10.1007/0-387-28831-7
Handbook of Mathematical Models in Computer Vision
  • Jan 1, 2006
  • Nikos Paragios + 2 more

This comprehensive volume is an essential reference tool for professional and academic researchers in the filed of computer vision, image processing, and applied mathematics. Continuing rapid advances in image processing have been enhanced by the theoretical efforts of mathematicians and engineers. This marriage of mathematics and computer vision - computational vision - has resulted in a discrete approach to image processing that is more reliable when leveraging in practical tasks. This comprehensive volume provides a detailed discourse on the mathematical models used in computational vision from leading educators and active research experts in this field. Topical areas include: image reconstruction, segmentation and object extraction, shape modeling and registration, motion analysis and tracking, and 3D from images, geometry and reconstruction. The book also includes a study of applications in medical image analysis. Handbook of Mathematical Models in Computer Vision provides a graduate-level treatment of this subject as well as serving as a complete reference work for professionals.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant