Towards Fine-Grained Sidewalk Accessibility Assessment with Deep Learning: Initial Benchmarks and an Open Dataset
We examine the feasibility of using deep learning to infer 33 classes of sidewalk accessibility conditions in pre-cropped streetscape images, including bumpy, brick/cobblestone, cracks, height difference (uplifts), narrow, uneven/slanted, pole, and sign. We present two experiments: first, a comparison between two state-of-the-art computer vision models, Meta’s DINOv2 and OpenAI’s CLIP-ViT, on a cleaned dataset of ∼ 24k images; second, an examination of a larger but noisier crowdsourced dataset (∼ 87k images) on the best performing model from Experiment 1. Though preliminary, Experiment 1 shows that certain sidewalk conditions can be identified with high precision and recall, such as missing tactile warnings on curb ramps and grass grown on sidewalks, while Experiment 2 demonstrates that larger but noisier training data can have a detrimental effect on performance. We contribute an open dataset and classification benchmarks to advance this important area.
- Research Article
4
- 10.1016/j.compbiomed.2025.110003
- May 1, 2025
- Computers in biology and medicine
CACTUS: An open dataset and framework for automated Cardiac Assessment and Classification of Ultrasound images using deep transfer learning.
- Research Article
23
- 10.1142/s0219843619500130
- Aug 1, 2019
- International Journal of Humanoid Robotics
Deep learning (DL) has made tremendous contributions to image processing. Recently, the DL has also attracted attention in the specialized field of neural decoding from raw myoelectric signals (electromyograms, EMGs). However, to our knowledge, most existing methods require some measure of preprocessing of the raw EMGs. Moreover, research to date has not accounted for the variability in the signal during time sequences. In this paper, we propose a new convolutional neural network (CNN) structure that can directly process raw EMG signals for hand gesture classification. More specifically, we assess the effects of various window sizes and of two different EMG representations (time sequence and frequency spectra) on open EMG datasets. We found that the frequency spectra derived from raw EMGs is more suitable as the model input in the task of gesture classification. Meanwhile, the combination use of long window could improve the classification accuracy (CA) and the window of 1024 ms achieved the best results on two open datasets ([Formula: see text]% and [Formula: see text]%). Further, our model requires no feature extraction procedures and is comparable with the optimal combination of features and classifier used by the traditional methods in the performance of specific tasks.
- Conference Article
2
- 10.1145/3373419.3373450
- Nov 8, 2019
The convolution network is a very powerful visual model that can be used to detect objects in an image. Traditional target detection frameworks are generally divided into anchor-based object detector and anchor-free object detector. Among them, SSD is a single-stage anchor-based object detector that can detect objects quickly and efficiently. In order to detect the infrared weak and small objects, we improve the SSD network for our object detection tasks by using an improved backbone network. We use the open UAVs dataset and achieve highly training and testing accuracy in the open dataset.
- Research Article
17
- 10.3389/frwa.2023.1298465
- Dec 6, 2023
- Frontiers in Water
Supervised Deep Learning (DL) methods have shown promise in monitoring the floating litter in rivers and urban canals but further advancements are hard to obtain due to the limited availability of relevant labeled data. To address this challenge, researchers often utilize techniques such as transfer learning (TL) and data augmentation (DA). However, there is no study currently reporting a rigorous evaluation of the effectiveness of these approaches for floating litter detection and their effects on the models' generalization capability. To overcome the problem of limited data availability, this work introduces the “TU Delft—Green Village” dataset, a novel labeled dataset of 9,473 camera and phone images of floating macroplastic litter and other litter items, captured using experiments in a drainage canal of TU Delft. We use the new dataset to conduct a thorough evaluation of the detection performance of five DL architectures for multi-class image classification. We focus the analysis on a systematic evaluation of the benefits of TL and DA on model performances. Moreover, we evaluate the generalization capability of these models for unseen litter items and new device settings, such as increasing the cameras' height and tilting them to 45°. The results obtained show that, for the specific problem of floating litter detection, fine-tuning all layers is more effective than the common approach of fine-tuning the classifier alone. Among the tested DA techniques, we find that simple image flipping boosts model accuracy the most, while other methods have little impact on the performance. The SqueezeNet and DenseNet121 architectures perform the best, achieving an overall accuracy of 89.6 and 91.7%, respectively. We also observe that both models retain good generalization capability which drops significantly only for the most complex scenario tested, but the overall accuracy raises significantly to around 75% when adding a limited amount of images to training data, combined with flipping augmentation. The detailed analyses conducted here and the released open source dataset offer valuable insights and serve as a precious resource for future research.
- Research Article
2
- 10.3390/sym12122094
- Dec 16, 2020
- Symmetry
Deep Learning algorithms are becoming common in solving different supervised and unsupervised learning problems. Different deep learning algorithms were developed in last decade to solve different learning problems in different domains such as computer vision, speech recognition, machine translation, etc. In the research field of computer vision, it is observed that deep learning has become overwhelmingly popular. In solving computer vision related problems, we first take a CNN (Convolutional Neural Network) which is trained from scratch or some times a pre-trained model is taken and further fine-tuned based on the dataset that is available. The problem of training the model from scratch on new datasets suffers from catastrophic forgetting. Which means that when a new dataset is used to train the model, it forgets the knowledge it has obtained from an existing dataset. In other words different datasets does not help the model to increase its knowledge. The problem with the pre-trained models is that mostly CNN models are trained on open datasets, where the data set contains instances from specific regions. This results into predicting disturbing labels when the same model is used for instances of datasets collected in a different region. Therefore, there is a need to find a solution on how to reduce the gap of Geo-diversity in different computer vision problems in developing world. In this paper, we explore the problems of models that were trained from scratch along with models which are pre-trained on a large dataset, using a dataset specifically developed to understand the geo-diversity issues in open datasets. The dataset contains images of different wedding scenarios in South Asian countries. We developed a Lifelong CNN that can incrementally increase knowledge i.e., the CNN learns labels from the new dataset but includes the existing knowledge of open data sets. The proposed model demonstrates highest accuracy compared to models trained from scratch or pre-trained model.
- Research Article
- 10.1007/s10845-025-02620-6
- May 25, 2025
- Journal of Intelligent Manufacturing
Nesting is pivotal in maximizing material use and productivity within manufacturing industries and involves the ordering, rotational placement, and translation placement of 2D irregular patterns onto raw material sheets. Despite the industrial significance, few methodologies tackle the challenging rotational placement problem due to its computational complexity. Unlike traditional search-based heuristics and meta-heuristics methods, this research pioneers a Deep Reinforcement Learning (DRL)-based framework that acquires a learning-based policy for flexible rotational placement and combines it with two rule-based policies to ensure a comprehensive nesting solution. Empowered by a bespoke Deep Learning (DL)-based geometric semantics extractor module, our approach achieves a $$97\%$$ 97 % improvement in computation time and a $$11\%$$ 11 % enhancement in material utilization compared to an open-source nesting software on a dataset from the sheet metal industry. Additionally, it shows competitive industry-practical performance against prevailing nesting algorithms on open datasets while being at least six times faster in computation time. Furthermore, this paper introduces a novel metric for geometrical irregularity, enriching the analysis and evaluation of nesting problems.
- Research Article
- 10.1142/s0218539324500591
- Jan 20, 2025
- International Journal of Reliability, Quality and Safety Engineering
At present, the fault big data of open source software are opened as the open data set. In particular, the fault detection phenomenon depends on various situation of operation in OSS. Actually, various software reliability growth models have been actively proposed by several researchers in the past. This paper applies the deep learning approach to the OSS fault big data. Then, we propose several reliability assessment measures based on the deep learning. As an approach, the range of estimate expands by the Wiener process embedded for the data preprocessing. Furthermore, this paper proposes the performability as novel reliability assessment measure from the proposed deep learning model. In particular, we develop the prototype of 3D reliability assessment tool. Several illustration examples based on the developed prototype of 3D reliability assessment tool by using the actual fault big data sets are shown in this paper.
- Conference Article
2
- 10.1117/12.2682979
- Jun 16, 2023
The application of deep learning technology in target detection algorithm significantly improves the performance of the algorithm. Based on the traditional target detection algorithm, the task of target detection is summarized, including evaluation index, open data set, algorithm framework and the defects of traditional algorithm. Therefore, taking the needs of object detection as the fulcrum, the training goal of research travel talents is clarified in this paper. There are two classification criteria: whether there is an explicit regional suggestion and whether a prior anchor frame is defined. The existing target detection algorithms are classified, and the evolutionary route of each algorithm is reviewed, and the mechanism, advantages, limitations and application scenarios of each method are summarized. The performance of representative target detection algorithms in open data sets is compared and analyzed.
- Research Article
11
- 10.3390/diagnostics11111951
- Oct 21, 2021
- Diagnostics
In the automatic diagnosis of ocular toxoplasmosis (OT), Deep Learning (DL) has arisen as a powerful and promising approach for diagnosis. However, despite the good performance of the models, decision rules should be interpretable to elicit trust from the medical community. Therefore, the development of an evaluation methodology to assess DL models based on interpretability methods is a challenging task that is necessary to extend the use of AI among clinicians. In this work, we propose a novel methodology to quantify the similarity between the decision rules used by a DL model and an ophthalmologist, based on the assumption that doctors are more likely to trust a prediction that was based on decision rules they can understand. Given an eye fundus image with OT, the proposed methodology compares the segmentation mask of OT lesions labeled by an ophthalmologist with the attribution matrix produced by interpretability methods. Furthermore, an open dataset that includes the eye fundus images and the segmentation masks is shared with the community. The proposal was tested on three different DL architectures. The results suggest that complex models tend to perform worse in terms of likelihood to be trusted while achieving better results in sensitivity and specificity.
- Research Article
47
- 10.1038/s41598-023-41359-z
- Sep 2, 2023
- Scientific reports
Schizophrenia is a chronic neuropsychiatric disorder that causes distinct structural alterations within the brain. We hypothesize that deep learning applied to a structural neuroimaging dataset could detect disease-related alteration and improve classification and diagnostic accuracy. We tested this hypothesis using a single, widely available, and conventional T1-weighted MRI scan, from which we extracted the 3D whole-brain structure using standard post-processing methods. A deep learning model was then developed, optimized, and evaluated on three open datasets with T1-weighted MRI scans of patients with schizophrenia. Our proposed model outperformed the benchmark model, which was also trained with structural MR images using a 3D CNN architecture. Our model is capable of almost perfectly (area under the ROC curve = 0.987) distinguishing schizophrenia patients from healthy controls on unseen structural MRI scans. Regional analysis localized subcortical regions and ventricles as the most predictive brain regions. Subcortical structures serve a pivotal role in cognitive, affective, and social functions in humans, and structural abnormalities of these regions have been associated with schizophrenia. Our finding corroborates that schizophrenia is associated with widespread alterations in subcortical brain structure and the subcortical structural information provides prominent features in diagnostic classification. Together, these results further demonstrate the potential of deep learning to improve schizophrenia diagnosis and identify its structural neuroimaging signatures from a single, standard T1-weighted brain MRI.
- Conference Article
284
- 10.1109/iccsn.2016.7586590
- Jun 1, 2016
Recently, deep learning has gained prominence due to the potential it portends for machine learning. For this reason, deep learning techniques have been applied in many fields, such as recognizing some kinds of patterns or classification. Intrusion detection analyses got data from monitoring security events to get situation assessment of network. Lots of traditional machine learning method has been put forward to intrusion detection, but it is necessary to improvement the detection performance and accuracy. This paper discusses different methods which were used to classify network traffic. We decided to use different methods on open data set and did experiment with these methods to find out a best way to intrusion detection.
- Research Article
6
- 10.1007/s00464-024-11341-5
- Oct 21, 2024
- Surgical Endoscopy
BackgroundManual objective assessment of skill and errors in minimally invasive surgery have been validated with correlation to surgical expertise and patient outcomes. However, assessment and error annotation can be subjective and are time-consuming processes, often precluding their use. Recent years have seen the development of artificial intelligence models to work towards automating the process to allow reduction of errors and truly objective assessment. This study aimed to validate surgical skill rating and error annotations in suturing gestures to inform the development and evaluation of AI models.MethodsSAR-RARP50 open data set was blindly, independently annotated at the gesture level in Robotic-Assisted Radical Prostatectomy (RARP) suturing. Manual objective assessment tools and error annotation methodology, Objective Clinical Human Reliability Analysis (OCHRA), were used as ground truth to train and test vision-based deep learning methods to estimate skill and errors. Analysis included descriptive statistics plus tool validity and reliability.ResultsFifty-four RARP videos (266 min) were analysed. Strong/excellent inter-rater reliability (range r = 0.70–0.89, p < 0.001) and very strong correlation (r = 0.92, p < 0.001) between objective assessment tools was demonstrated. Skill estimation of OSATS and M-GEARS had a Spearman’s Correlation Coefficient 0.37 and 0.36, respectively, with normalised mean absolute error representing a prediction error of 17.92% (inverted “accuracy” 82.08%) and 20.6% (inverted “accuracy” 79.4%) respectively. The best performing models in error prediction achieved mean absolute precision of 37.14%, area under the curve 65.10% and Macro-F1 58.97%.ConclusionsThis is the first study to employ detailed error detection methodology and deep learning models within real robotic surgical video. This benchmark evaluation of AI models sets a foundation and promising approach for future advancements in automated technical skill assessment.Graphical abstract
- Research Article
14
- 10.14201/adcaij202110297122
- Feb 28, 2021
- ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal
Nowadays, technology and computer science are rapidly developing many tools and algorithms, especially in the field of artificial intelligence. Machine learning is involved in the development of new methodologies and models that have become a novel machine learning area of applications for artificial intelligence. In addition to the architectures of conventional neural network methodologies, deep learning refers to the use of artificial neural network architectures which include multiple processing layers.
 In this paper, models of the Convolutional neural network were designed to detect (diagnose) plant disorders by applying samples of healthy and unhealthy plant images analyzed by means of methods of deep learning. The models were trained using an open data set containing (18,000) images of ten different plants, including healthy plants. Several model architectures have been trained to achieve the best performance of (97 percent) when the respectively [plant, disease] paired are detected. This is a very useful information or early warning technique and a method that can be further improved with the substantially high-performance rate to support an automated plant disease detection system to work in actual farm conditions.
- Research Article
6
- 10.1002/jmri.29088
- Oct 25, 2023
- Journal of magnetic resonance imaging : JMRI
"Batch effect" in MR images, due to vendor-specific features, MR machine generations, and imaging parameters, challenges image quality and hinders deep learning (DL) model generalizability. We aim to develop a DL model using contrast adjustment and super-resolution to reduce diffusion-weighted images (DWIs) diversity across magnetic field strengths and imaging parameters. Retrospective. The DL model was built using an open dataset from one individual. The MR machine identification model was trained and validated on a dataset of 1134 adults (54% females, 46% males), with 1050 subjects showing no DWI abnormalities and 84 with conditions like stroke and tumors. The 21,000 images were divided into 80% for training, 20% for validation, and 3500 for testing. Seven MR scanners from four manufacturers with 1.5 T and 3 T magnetic field strengths. DWIs were acquired using spin-echo sequences and high-resolution T2WIs using the T2-SPACE sequence. An experienced, board-certified radiologist evaluated the effectiveness of restoring high-resolution T2WI and harmonizing diverse DWI with metrics such as PSNR and SSIM, and the texture and frequency attributes were further analyzed using gray-level co-occurrence matrix and 1-dimensional power spectral density. The model's impact on machine-specific characteristics was gauged through the performance metrics of a ResNet-50 model. Comprehensive statistical tests were employed for statistical robustness, including McNemar's test and the Dice index. Our DL protocol reduced DWI contrast and resolution variation. ResNet-50 model's accuracy decreased from 0.9443 to 0.5786, precision from 0.9442 to 0.6494, recall from 0.9443 to 0.5786, and F1 score from 0.9438 to 0.5587. The t-SNE visualization indicated more consistent image features across multiple MR devices. Autoencoder halved learning iterations; Dice coefficient >0.74 confirmed signal reproducibility in 84 lesions. This study presents a DL strategy to mitigate batch effects in diffusion MR images, improving their quality and generalizability. 3 TECHNICAL EFFICACY: Stage 1.
- Conference Article
2
- 10.1109/smc53654.2022.9945565
- Oct 9, 2022
In a recent day, we could witness an explosive growth of artificial intelligence and deep learning in medical applications. With the increased availability of medical images, deep learning tools can provide a necessary diagnostic utility. However, current DNN models have shown high variations in their performance to each medical image datasets. In this study, we proposed ensemble learning to achieve synergistic improvements in model accuracy and thereby provide highly stabilized performance on diverse medical datasets. We first investigated the model performance of the latest deep learning architectures, e.g., Inception, VGGNet, MobileNet, Xception, ResNet50, and selected 7 state-of-the-art models to the diverse open CT datasets (SARS-COV-2 CT-Scan, USCD CT, and COVID-X dataset). The model parameters were transferred from the other domain and fine-tuned based on medical image sets. The last convolutional layers were stacked and a fully-connected neural network is employed to find generalized feature space. The peak accuracy of the fine-tuned single CNN models were InceptionV3 - 0.96, VGG16 - 0.94, VGG19 - 0.94, MobileNetV2 - 0.98, Xception - 0.9, ResNet - 0.96, DenseNet201 - 0.97. The proposed ensemble model achieves the peak accuracy of 0.99%, outperforming each individual model and achieving the highest performance in all three open CT datasets. Experimental results demonstrated that the proposed ensemble model is able to represent the hierarchical features and thereby it improves the stability and reproducibility of the classifier models.