To what extent do DNN-based image classification models make unreliable inferences?
Deep Neural Network (DNN) models are widely used for image classification. While they offer high performance in terms of accuracy, researchers are concerned about if these models inappropriately make inferences using features irrelevant to the target object in a given image. To address this concern, we propose a metamorphic testing approach that assesses if a given inference is made based on irrelevant features. Specifically, we propose two metamorphic relations (MRs) to detect such unreliable inferences. These relations expect (a) the classification results with different labels or the same labels but less certainty from models after corrupting the relevant features of images, and (b) the classification results with the same labels after corrupting irrelevant features. The inferences that violate the metamorphic relations are regarded as unreliable inferences. Our evaluation demonstrated that our approach can effectively identify unreliable inferences for single-label classification models with an average precision of 64.1% and 96.4% for the two MRs, respectively. As for multi-label classification models, the corresponding precision for MR-1 and MR-2 is 78.2% and 86.5%, respectively. Further, we conducted an empirical study to understand the problem of unreliable inferences in practice. Specifically, we applied our approach to 18 pre-trained single-label image classification models and 3 multi-label classification models, and then examined their inferences on the ImageNet and COCO datasets. We found that unreliable inferences are pervasive. Specifically, for each model, more than thousands of correct classifications are actually made using irrelevant features. Next, we investigated the effect of such pervasive unreliable inferences, and found that they can cause significant degradation of a model’s overall accuracy. After including these unreliable inferences from the test set, the model’s accuracy can be significantly changed. Therefore, we recommend that developers should pay more attention to these unreliable inferences during the model evaluations. We also explored the correlation between model accuracy and the size of unreliable inferences. We found the inferences of the input with smaller objects are easier to be unreliable. Lastly, we found that the current model training methodologies can guide the models to learn object-relevant features to certain extent, but may not necessarily prevent the model from making unreliable inferences. We encourage the community to propose more effective training methodologies to address this issue.
- Book Chapter
1
- 10.1007/978-981-16-9113-3_44
- Jan 1, 2022
Automated image classification is an essential task of the computer vision field. The tagging of images into a set of predefined groups is referred to as image classification. The implementation of computer vision to automate image classification would be beneficial because manual image evaluation and identification can be time-consuming, particularly when there are many images of different classes. Deep learning approaches are proven to overperform existing machine learning techniques in many fields in recent years, and computer vision is one of the most notable examples. The very deep neural network (VDCNN) is a powerful deep learning model for image classification, and this paper examines it briefly using MNIST handwritten digit dataset. This dataset is used to prove the efficacy of very deep neural networks over other deep learning models. The proposed study aims to comprehend the very deep neural network architecture used to accomplish a handwritten digit recognition task. The feasibility of the proposed model is evaluated using mean accuracy, validation accuracy, and standard deviation. The study results of the very deep neural network model are compared to a convolutional neural network and convolutional neural network with batch normalization. According to the results of the comparison study, very deep neural networks achieve high accuracy of 99.1% for a handwritten dataset. The outcome of the proposed work is used to interpret how well a very deep neural network performs when compared to the other two models of deep neural networks. This proposed architecture may be used to automate the classification of handwritten digits dataset.
- Abstract
- 10.1093/neuonc/noae064.716
- Jun 18, 2024
- Neuro-Oncology
BACKGROUNDHistologic examination is vital in oncology research and diagnostics. The adoption of digital scanning of whole slide images (WSI) has created an opportunity to leverage deep learning-based image classification methods to enhance diagnosis and risk stratification. However, technical limitations prevent training and deployment of accurate comprehensive multiclass deep convolutional neural networks (DCNN) models for histopathology image classification. The input dimensions of DCNN architectures are small compared to the typical pathologist field of view, degrading performance by excluding important architectural features. Furthermore, data requirements for comprehensive models are sufficiently large to overwhelm the system memory during training. METHODSA method termed Learned Resizing with Efficient Training (LRET) was developed to address the main limitations of traditional histopathology classification model training. The LRET method couples efficient training techniques with image resizing to facilitate seamless integration of larger histology image patches into state-of-the-art classification models while preserving important structural information. The LRET method was coupled with two distinct resizing techniques to train three diverse histology image datasets using five different DCNN architectures. Performance metrics were compared on cross validation and hold out test sets. RESULTSLRET-trained models were flexible to multiple input patch dimensions and DCNN models. We demonstrated performance improvement across all datasets while significantly reducing the training time and resources over traditional methods. Using a large-scale, multiclass brain tumor classification dataset consisting of 74 distinct histopathologic classes, LRET-trained models outperformed existing methods by 15-28% in accuracy, yielding 94% accuracy for the best model. CONCLUSIONThe LRET method for DCNN training significantly enhances the performance of large-scale multiclass histopathology image classification. The implications of this work extend to broader applications within medical imaging and beyond, where efficient integration of high-resolution images into deep learning pipelines is paramount for driving advancements research and clinical practice.
- Addendum
24
- 10.1016/j.micpro.2020.103796
- Dec 29, 2020
- Microprocessors and Microsystems
RETRACTED: Research on environmental landscape design based on virtual reality technology and deep learning
- Conference Article
- 10.1109/ssci.2018.8628751
- Nov 1, 2018
Deep Convolutional Neural Networks have led to series of breakthroughs in image classification. With increasing demand to run DCNN based models on mobile platforms with minimal computing capabilities and lesser storage space, the challenge is optimizing those DCNN models for lesser computation and smaller memory footprint. This paper presents a highly efficient and modularized Deep Neural Network (DNN) model for image classification, which outperforms state of the art models in terms of both speed and accuracy. The proposed DNN model is constructed by repeating a building block that aggregates a set of transformations with the same topology. In order to make a lighter model, it uses Depthwise Separable convolution, Grouped convolution and identity shortcut connections. It reduces computations approximately by 100M FLOPs in comparison to MobileNet with a slight improvement in accuracy when validated on CIFAR-10, CIFAR-100 and Caltech-256 datasets.
- Research Article
15
- 10.1016/j.dibe.2023.100144
- Mar 17, 2023
- Developments in the Built Environment
Fused deep neural networks for sustainable and computational management of heat-transfer pipeline diagnosis
- Book Chapter
1
- 10.1007/978-981-19-7184-6_25
- Jan 1, 2023
In the development of social economy and scientific and technological innovation, the image processing mode and classification model chosen by network technology platform is becoming more and more changeable, but in essence, it is necessary to obtain characteristic information in effective image recognition and choose high-quality network algorithm and processing technology to complete image processing and image classification. Therefore, on the basis of understanding the current research trend of computer image processing and image classification model methods, this paper conducts in-depth discussion on the image processing methods and image classification model training design with artificial intelligence as the core and takes the image classification model of transfer learning as an example for practical exploration. The final results show that the image processing method and image classification model based on artificial intelligence have strong performance advantages in practical application.KeywordsArtificial intelligenceImage processingImage classificationThe migration study
- Research Article
24
- 10.1016/j.asr.2023.08.057
- Sep 4, 2023
- Advances in Space Research
A comparative evaluation of deep convolutional neural network and deep neural network-based land use/land cover classifications of mining regions using fused multi-sensor satellite data
- Research Article
64
- 10.1155/2022/3351256
- Jul 19, 2022
- Advances in Multimedia
Not only were traditional artificial neural networks and machine learning difficult to meet the processing needs of massive images in feature extraction and model training but also they had low efficiency and low classification accuracy when they were applied to image classification. Therefore, this paper proposed a deep learning model of image classification, which aimed to provide foundation and support for image classification and recognition of large datasets. Firstly, based on the analysis of the basic theory of neural network, this paper expounded the different types of convolution neural network and the basic process of its application in image classification. Secondly, based on the existing convolution neural network model, the noise reduction and parameter adjustment were carried out in the feature extraction process, and an image classification depth learning model was proposed based on the improved convolution neural network structure. Finally, the structure of the deep learning model was optimized to improve the classification efficiency and accuracy of the model. In order to verify the effectiveness of the deep learning model proposed in this paper in image classification, the relationship between the accuracy of several common network models in image classification and the number of iterations was compared through experiments. The results showed that the model proposed in this paper was better than other models in classification accuracy. At the same time, the classification accuracy of the deep learning model before and after optimization was compared and analyzed by using the training set and test set. The results showed that the accuracy of image classification had been greatly improved after the model proposed in this paper had been optimized to a certain extent.
- Conference Article
4
- 10.1109/mysurucon52639.2021.9641594
- Oct 24, 2021
Making computer detect desired object have always been an area of interest for humans. Object detection can be implemented using following stages: feature extraction, object localization followed by identifying object in input image. Most of the present-day object detection work is focused around x86 and ARM architectures. Researchers constantly strive to either identify better object detection architectures, updated models, improved model accuracies or reduce prediction time. In this paper, multiple pre-trained Deep Neural Network (DNN) models such as Region Based Convolutional Neural Network (RCNN), Fast RCNN, Faster RCNN. You Only Look Once (YOLO) V3 and Single Shot Multibox Detector (SSD) are used to identify fruits in given input image on RISC- V architecture. In order to bring uniformity across all DNN models, all these models are pre-trained on COCO datasets. Experimental results have shown that out of various DNN models tested for object recognition, YOLO and SSD-MobileNet gives optimum performance in terms of accuracy and inference time on RISC- V architecture.
- Research Article
26
- 10.1088/1361-6560/abc812
- Dec 11, 2020
- Physics in Medicine & Biology
Robustness is an important aspect when evaluating a method of medical image analysis. In this study, we investigated the robustness of a deep learning (DL)-based lung-nodule classification model for CT images with respect to noise perturbations. A deep neural network (DNN) was established to classify 3D CT images of lung nodules into malignant or benign groups. The established DNN was able to predict malignancy rate of lung nodules based on CT images, achieving the area under the curve of 0.91 for the testing dataset in a tenfold cross validation as compared to radiologists’ prediction. We then evaluated its robustness against noise perturbations. We added to the input CT images noise signals generated randomly or via an optimization scheme using a realistic noise model based on a noise power spectrum for a given mAs level, and monitored the DNN’s output. The results showed that the CT noise was able to affect the prediction results of the established DNN model. With random noise perturbations at 100 mAs, DNN’s predictions for 11.2% of training data and 17.4% of testing data were successfully altered by at least once. The percentage increased to 23.4% and 34.3%, respectively, for optimization-based perturbations. We further evaluated robustness of models with different architectures, parameters, number of output labels, etc, and robustness concern was found in these models to different degrees. To improve model robustness, we empirically proposed an adaptive training scheme. It fine-tuned the DNN model by including perturbations in the training dataset that successfully altered the DNN’s perturbations. The adaptive scheme was repeatedly performed to gradually improve DNN’s robustness. The numbers of perturbations at 100 mAs affecting DNN’s predictions were reduced to 10.8% for training and 21.1% for testing by the adaptive training scheme after two iterations. Our study illustrated that robustness may potentially be a concern for an exemplary DL-based lung-nodule classification model for CT images, indicating the needs for evaluating and ensuring model robustness when developing similar models. The proposed adaptive training scheme may be able to improve model robustness.
- Research Article
2
- 10.1515/nleng-2022-0194
- Jan 24, 2023
- Nonlinear Engineering
Recently, it has become a popular strategy in multi-label image recognition to predict those labels that co-occur in a picture. Previous work has concentrated on capturing label correlation but has neglected to correctly fuse picture features and label embeddings, which has a substantial influence on the model’s convergence efficiency and restricts future multi-label image recognition accuracy improvement. In order to better classify labeled training samples of corresponding categories in the field of image classification, a cross-modal multi-label image classification modeling and recognition method based on nonlinear is proposed. Multi-label classification models based on deep convolutional neural networks are constructed respectively. The visual classification model uses natural images and simple biomedical images with single labels to achieve heterogeneous transfer learning and homogeneous transfer learning, capturing the general features of the general field and the proprietary features of the biomedical field, while the text classification model uses the description text of simple biomedical images to achieve homogeneous transfer learning. The experimental results show that the multi-label classification model combining the two modes can obtain a hamming loss similar to the best performance of the evaluation task, and the macro average F1 value increases from 0.20 to 0.488, which is about 52.5% higher. The cross-modal multi-label image classification algorithm can better alleviate the problem of overfitting in most classes and has better cross-modal retrieval performance. In addition, the effectiveness and rationality of the two cross-modal mapping techniques are verified.
- Book Chapter
31
- 10.1007/978-3-031-33374-3_26
- Jan 1, 2023
In this paper, we introduce weight prediction into the AdamW optimizer to boost its convergence when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, we predict the future weights according to the update rule of AdamW and then apply the predicted future weights to do both forward pass and backward propagation. In this way, the AdamW optimizer always utilizes the gradients w.r.t. the future weights instead of current weights to update the DNN parameters, making the AdamW optimizer achieve better convergence. Our proposal is simple and straightforward to implement but effective in boosting the convergence of DNN training. We performed extensive experimental evaluations on image classification and language modeling tasks to verify the effectiveness of our proposal. The experimental results validate that our proposal can boost the convergence of AdamW and achieve better accuracy than AdamW when training the DNN models.
- Book Chapter
- 10.1007/978-3-031-31417-9_9
- Jan 1, 2023
Automated analysis of dermoscopic images for detecting malignant lesions can improve diagnostic performance and reduce premature deaths. While several automated classification algorithms using deep convolutional neural network (DCNN) models have been proposed, the need for performance improvement remains. The key limitations of developing a robust DCNN model for the dermoscopic image classification are (a) sub-sampling or pooling layer in traditional DCNN has theoretical drawbacks in capturing object-part relationship, (b) increasing the network depth can improve the performance but is prone to suffer from the vanishing gradient problem, and (c) due to imbalanced dataset, the trained DCNN tends to be biased towards the majority classes. To overcome these limitations, we propose a novel deep Attention Residual Capsule Network (ARCN) for dermoscopic image classification to diagnose skin diseases. The proposed model combines the concept of residual learning, self-attention mechanism, and capsule network. The residual learning is employed to address the vanishing gradient problem, the self-attention mechanism is employed to prioritize important features without using any extra learnable parameters, capsule network is employed to cope up with information loss due to the sub-sampling (max-pooling) layer. To deal with the classifier’s bias toward the majority classes, a novel Mini-Batch-wise weight-balancing Focal Loss strategy is proposed. HAM10000, a benchmark dataset of dermoscopic images is used to train the deep model and evaluate the performance. The ARCN-18 (modification of ResNet-18) network trained with the proposed loss produces an accuracy of 0.8206 for the considered test set.
- Conference Article
1
- 10.1109/dsa51864.2020.00088
- Nov 1, 2020
Deep Neural Networks (DNNs) are increasingly applied to solve path planning problems in recent years. However, unexpected or incorrect behaviors of DNNs greatly threaten the reliability of DNN-based path planning algorithms. Therefore, the reliability should be evaluated through the software testing process. The quality of the training dataset is of great importance to the pre-trained DNN models. The pretrained model may still lack generality by using a randomly generated and insufficient training dataset. And DNN-based system testing is faced with Oracle problems. Because Metamorphic Testing (MT) has been shown considerable effectiveness in alleviating the absence of oracle problems. To increase the reliability of DNN-based path planning algorithms, in this paper, we present a test technique specialized for DNN-based path planning algorithms based on metamorphic testing. We present a framework for systematically designing sixteen metamorphic relations (MRs) by combining input transformations and output relations. And experiments are carried out on an actually released business software system, which demonstrates that our method is effective. The results show that our approach can effectively improve the diversity of test data, the accuracy of the DNN model, and the reliability of the software.
- Research Article
4
- 10.1038/s41598-022-20012-1
- Sep 30, 2022
- Scientific Reports
Deep neural networks (DNNs) have shown success in image classification, with high accuracy in recognition of everyday objects. Performance of DNNs has traditionally been measured assuming human accuracy is perfect. In specific problem domains, however, human accuracy is less than perfect and a comparison between humans and machine learning (ML) models can be performed. In recognising everyday objects, humans have the advantage of a lifetime of experience, whereas DNN models are trained only with a limited image dataset. We have tried to compare performance of human learners and two DNN models on an image dataset which is novel to both, i.e. histological images. We thus aim to eliminate the advantage of prior experience that humans have over DNN models in image classification. Ten classes of tissues were randomly selected from the undergraduate first year histology curriculum of a Medical School in North India. Two machine learning (ML) models were developed based on the VGG16 (VML) and Inception V2 (IML) DNNs, using transfer learning, to produce a 10-class classifier. One thousand (1000) images belonging to the ten classes (i.e. 100 images from each class) were split into training (700) and validation (300) sets. After training, the VML and IML model achieved 85.67 and 89% accuracy on the validation set, respectively. The training set was also circulated to medical students (MS) of the college for a week. An online quiz, consisting of a random selection of 100 images from the validation set, was conducted on students (after obtaining informed consent) who volunteered for the study. 66 students participated in the quiz, providing 6557 responses. In addition, we prepared a set of 10 images which belonged to different classes of tissue, not present in training set (i.e. out of training scope or OTS images). A second quiz was conducted on medical students with OTS images, and the ML models were also run on these OTS images. The overall accuracy of MS in the first quiz was 55.14%. The two ML models were also run on the first quiz questionnaire, producing accuracy between 91 and 93%. The ML models scored more than 80% of medical students. Analysis of confusion matrices of both ML models and all medical students showed dissimilar error profiles. However, when comparing the subset of students who achieved similar accuracy as the ML models, the error profile was also similar. Recognition of ‘stomach’ proved difficult for both humans and ML models. In 04 images in the first quiz set, both VML model and medical students produced highly equivocal responses. Within these images, a pattern of bias was uncovered–the tendency of medical students to misclassify ‘liver’ tissue. The ‘stomach’ class proved most difficult for both MS and VML, producing 34.84% of all errors of MS, and 41.17% of all errors of VML model; however, the IML model committed most errors in recognising the ‘skin’ class (27.5% of all errors). Analysis of the convolution layers of the DNN outlined features in the original image which might have led to misclassification by the VML model. In OTS images, however, the medical students produced better overall score than both ML models, i.e. they successfully recognised patterns of similarity between tissues and could generalise their training to a novel dataset. Our findings suggest that within the scope of training, ML models perform better than 80% medical students with a distinct error profile. However, students who have reached accuracy close to the ML models, tend to replicate the error profile as that of the ML models. This suggests a degree of similarity between how machines and humans extract features from an image. If asked to recognise images outside the scope of training, humans perform better at recognising patterns and likeness between tissues. This suggests that ‘training’ is not the same as ‘learning’, and humans can extend their pattern-based learning to different domains outside of the training set.