A memory model for image recognition and classification based on convolutional neural network and Bayesian decision
This paper introduces a memory-based image recognition model combining convolutional neural networks and Bayesian decision, which extracts binary features, models storage and retrieval, and outperforms methods like SRC and ELM on Caltech datasets by achieving higher hit probabilities and lower false alarm rates.
Most popular image classification methods mainly focus on classification ability rather than recognizing new things. However, human lay emphasis on cognition first and then classification, which is closely related to human memory system. Though many memory models have been proposed, they are studied in word list whereas the reports about natural images are still limited. This paper proposes a memory model for image recognition and classification based on convolutional neural network and Bayesian decision. First the image feature is extracted by convolutional neural network and stored in binary form. Then the representation, storage and retrieval processes of visual images are modeled. The test image feature vector is matched in parallel to the studied image vectors, and the likelihood values are calculated. Finally, the odd that the test image belongs to a new class is computed based on all likelihood values. If the odd value is greater than a certain threshold, the test image is regarded as new; otherwise, the Bayesian decision rule for image classification is performed. Experimental results on Caltech-101 and Caltech-256 datasets show that the proposed method can perform well in image recognition and classification tasks. And the hit probability of the method is higher than two typical methods, SRC and ELM, at present while the false alarm rate is far lower than them.
- Conference Article
- 10.1109/ssci44817.2019.9002860
- Dec 1, 2019
Though traditional classification methods show well performance in classification tasks, most of them mainly lay emphasis on ‘classification’ rather than ‘cognition’. When a new object that has never been seen is encountered, the traditional methods falsely default the image as a certain category that has been studied, however, humans can first identify the image as new. In this paper, we present a memory model for visual images classification based on residual neural network and Bayesian decision (VICRB). First, the feature vectors of the visual images for each category are extracted with residual neural network and each feature component may be correctly copied or randomly produced. Then the processes about how the visual images represent, store and retrieve are modeled. The feature vector of test images is matched with the feature vectors of learned images and the likelihood ratio is computed according to probabilistic inference theory. Finally, the odds value in favor of an old over a new image is computed by all likelihood values. According to the odd value, the Bayesian decision rule is applied to the image classification. Experimental results on two benchmark images datasets show that the presented memory model performs well in images classification tasks.
- Research Article
5
- 10.54097/hset.v15i.2222
- Nov 26, 2022
- Highlights in Science, Engineering and Technology
With the deep learning (DL) sweeping the world. Traditional image classification methods are difficult to process huge image data and cannot meet people's requirements for image classification accuracy and speed. The image classification method based on convolutional neural network (CNN) breaks through the bottle neck of traditional image classification methods and becomes the mainstream algorithm of image classification at present, how to effectively use convolutional neural network to classify images has become a hot research topic in the field of computer vision at home and abroad. Convolutional neural network (CNN) has performed well in image classification and segmentation, target detection and other applications, and its powerful feature learning and feature expression capabilities are increasingly respected by researchers. However, CNN still has a few problems, such as incomplete feature extraction and overfitting of sample training. In view of these problems, after in-depth research on the application of convolutional neural network in image processing, this paper gives the mainstream structure model, advantages and disadvantages, time/space complexity, problems that may be encountered in the model training process and corresponding solutions used in image classification based on convolutional neural network. Through the overview of the research status of CNN model in image classification, it provides suggestions for the further development and research direction of CNN.
- Conference Article
- 10.1117/12.3086835
- Jan 22, 2026
Achieving high accuracy in remote sensing image classification remains a challenge for convolutional neural networks(CNNs), particularly when distinguishing between similar classes. This study proposes an efficient CNN-based memory modeling method combined with Bayesian decision theory for remote sensing image classification. The method first uses a CNN to extract image features and store them in binary form. These features are then expressed, stored, and retrieved through a memory modeling process, where the test image feature vector is matched against stored vectors to calculate likelihood values. The probability of the test image belonging to a new category is derived from these likelihoods; if the probability exceeds a threshold, the image is classified as a new category, otherwise the Bayesian decision rule is applied. Experimental results show that the proposed method achieves a higher hit rate and significantly lower false alarm rate(30.5%) compared with two baseline methods (73.7% and 100%). Tests on the Caltech-101 and Caltech-256 datasets confirm its superior performance over representative methods such as sparse representation classification (SRC) and extreme learning machine (ELM), making it highly effective for remote sensing image recognition and classification.
- Research Article
2
- 10.54254/2755-2721/81/20241009
- Nov 8, 2024
- Applied and Computational Engineering
Abstract. This paper reviews the application and improvement of convolutional neural networks (CNNs) in image classification. Firstly, a shallow CNN for interstitial lung disease image classification is presented. This model suppresses overfitting through a unique network architecture and optimisation algorithm. Next, the improved VGG16 architecture and MIDNet18 model are discussed and their superior performance in brain tumour image classification is demonstrated. Subsequently, a CNN-CapsNet model for cervical cancer image classification and its improvement are presented and the customised model is compared with the conventional VGG-16 CNN architecture in the paper. Next, the application of sparse convolutional kernels and hybrid sparse convolutional kernels (HDCs) in solving the problem of computational resource consumption is presented. Subsequently, methods for solving the problem of limited training data through transfer learning and network data augmentation techniques are discussed, as well as GAN-generated datasets for solving the overfitting problem. Finally, the effect of degraded images on the classification effectiveness of CNNs is explored. The results show that the improved CNN architecture and algorithms have significant effects in solving the problems of overfitting and computational resource consumption, and can significantly improve the accuracy and efficiency of image classification. And degraded images do adversely affect the accuracy of CNN for image classification.
- Research Article
1
- 10.1142/s0219467825500196
- Aug 3, 2023
- International Journal of Image and Graphics
As an important form of expression in modern civilization art, printmaking has a rich variety of types and a prominent sense of artistic hierarchy. Therefore, printmaking is highly favored around the world due to its unique artistic characteristics. Classifying print types through image feature elements will improve people’s understanding of print creation. Convolutional neural networks (CNNs) have good application effects in the field of image classification, so CNN is used for printmaking analysis. Considering that the classification effect of the traditional convolutional neural image classification model is easily affected by the activation function, the T-ReLU activation function is introduced. By utilizing adjustable parameters to enhance the soft saturation characteristics of the model and avoid gradient vanishing, a T-ReLU convolutional model is constructed. A better convolutional image classification model is proposed based on the T-ReLU convolutional model, taking into account the issue of subpar multi-level feature fusion in deep convolutional image classification models. Utilize normalization to analyze visual input, an eleven-layer convolutional network with residual units in the convolutional layer, and cascading thinking to fuse convolutional network defects. The performance test results showed that in the data test of different styles of artificial prints, the GT-ReLU model can obtain the best image classification accuracy, and the image classification accuracy rate is 0.978. The GT-ReLU model maintains a classification accuracy above 94.4% in the multi-dataset test classification performance test, which is higher than that of other image classification models. For the use of visual processing technology in the field of classifying prints, the research content provides good reference value.
- Book Chapter
1
- 10.1007/978-981-19-7184-6_25
- Jan 1, 2023
In the development of social economy and scientific and technological innovation, the image processing mode and classification model chosen by network technology platform is becoming more and more changeable, but in essence, it is necessary to obtain characteristic information in effective image recognition and choose high-quality network algorithm and processing technology to complete image processing and image classification. Therefore, on the basis of understanding the current research trend of computer image processing and image classification model methods, this paper conducts in-depth discussion on the image processing methods and image classification model training design with artificial intelligence as the core and takes the image classification model of transfer learning as an example for practical exploration. The final results show that the image processing method and image classification model based on artificial intelligence have strong performance advantages in practical application.KeywordsArtificial intelligenceImage processingImage classificationThe migration study
- Research Article
3
- 10.1109/jstars.2024.3469728
- Jan 1, 2025
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Based on studies using high-medium resolution images, convolutional neural networks (CNNs) and semantic segmentation have shown superiority over classical machine learning (ML), particularly in small-scale mapping. However, few/no studies have assessed the techniques on coarse resolution image classification for extensive area land cover mapping. In this study, we evaluated the performance and feasibility of three CNN models (1-D CNN, 2-D CNN, and 3-D CNN), and U-net for coarse-resolution satellite image classification and compared them to a random forest (RF) classifier. We utilized time-series, coarse resolution (1 km) composite imageries acquired by FengYun-3C visible and infrared radiometer. Labeled datasets were collected as shapefiles and split into three independent datasets: training, validation, and test datasets, and preprocessed to meet each model's input format requirements. We conducted several experiments to optimize models and select the best models. Then, the best models were evaluated on an unseen dataset. Among the DL models, one-dimensional (1-D) CNN achieved the highest overall accuracy (OA) 0. 87 and kappa (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i>) 0.84, 2% higher than the best results attained by 2-D CNN, 3-D CNN, and U-net models. However, 1-D CNN is outperformed by RF which achieved 0.89 (OA) and 0.87 (k). Achieving the best and the second-best results using RF and 1-D CNN models, respectively, indicates the superiority of the pixel-based method and the insignificance of spatial information in coarse-resolution image classification. Furthermore, although the DL models can yield high accuracy, especially 1-D CNN, they are less feasible than RF classifiers for coarse-resolution satellite image classification in extensive area land cover mapping.
- Research Article
3
- 10.5121/ijcsit.2024.16102
- Feb 28, 2024
- International Journal of Computer Science and Information Technology
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable of learning complex features directly from images and achieving outstanding performance across several datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection methods for pre-processing, and K-means clustering have been applied to segment the images. Image augmentation improves the size and diversity of datasets for training the models for image classification. This work highlights transfer learning’s effectiveness in image classification using CNNs and VGG 16 that provides insights into the selection of pre-trained models and hyper parameters for optimal performance. We have proposed a comprehensive approach for image segmentation and classification, incorporating preprocessing techniques, the K-means algorithm for segmentation, and employing deep learning models such as CNN and VGG 16 for classification.
- Research Article
15
- 10.4236/jcc.2024.124005
- Jan 1, 2024
- Journal of Computer and Communications
This research introduces an innovative approach to image classification, by making use of Vision Transformer (ViT) architecture. In fact, Vision Transformers (ViT) have emerged as a promising option for convolutional neural networks (CNN) for image analysis tasks, offering scalability and improved performance. Vision transformer ViT models are able to capture global dependencies and link among elements of images. This leads to the enhancement of feature representation. When the ViT model is trained on different models, it demonstrates strong classification capabilities across different image categories. The ViT’s ability to process image patches directly, without relying on spatial hierarchies, streamlines the classification process and improves computational efficiency. In this research, we present a Python implementation using TensorFlow to employ the (ViT) model for image classification. Four categories of animals such as (cow, dog, horse and sheep) images will be used for classification. The (ViT) model is used to extract meaningful features from images, and a classification head is added to predict the class labels. The model is trained on the CIFAR-10 dataset and evaluated for accuracy and performance. The findings from this study will not only demonstrate the effectiveness of the Vision Transformer model in image classification tasks but also its potential as a powerful tool for solving complex visual recognition problems. This research fills existing gaps in knowledge by introducing a novel approach that challenges traditional convolutional neural networks (CNNs) in the field of computer vision. While CNNs have been the dominant architecture for image classification tasks, they have limitations in capturing long-range dependencies in image data and require hand-designed hierarchical feature extraction.
- Conference Article
16
- 10.1109/picc51425.2020.9362375
- Dec 17, 2020
Image Classification is the task of assigning an input image to a label from a set of fixed labels. This is one of the main problems in computer vision that have many practical applications. For any classification problem, the main aim is to achieve better classification accuracy. If the classification accuracy is less, then misclassification happens and this will leads to different kinds of problems. Many of the classification models only consider the existing class instances. When a new class instance arrives the classification model not detect it properly. They actually misclassified the new class instance into an existing class instance. The proposed method therefore shows a better accurate classification and new class detection model for images. Also if needed, then the new class can be added with the model to classify correctly in the future. Recent studies show that Convolutional Neural Network(CNN) can be effectively used for image classification tasks. So here creating this better accurate classification and new class detection model based on CNN. The detection of a new class is done by looking into the trend of the softmax prediction score of class labels. In this work, the model is built for CIFAR10 image dataset. This dataset is actually a complex dataset, so creating a model for this dataset can consider as a base and extended for the classification and new class detection in other images in different applications.
- Research Article
- 10.54254/2755-2721/2026.ch31978
- Mar 2, 2026
- Applied and Computational Engineering
The performance of image classification models depends greatly on the architectural decisions made. Fashion-MNIST, as the mainstream adopted by researchers for model performance analysis, provides another avenue for the systematic comparison of different model architectures. In this paper, we have comparatively studied and analyzed the performances of Multi-Layer Perceptrons (MLP), Convolutional Neural Networks (CNN), Random Forests and Residual Networks (ResNet) on this dataset, and found that one of the reasons for the excellent performance of convolutional networks may lie in the ability of extracting spatial features inherently possessed by convolutional layers. Although a deeper ResNet-34 shows an excellent performance (91.15%), its large number of parameters makes it less efficient for general tasks. To improve the efficiency, we find that by increasing the number of channels in the first convolutional layer from 32 to 64, the achieved accuracy (92.44%) is superior to any single task, which verifies the effectiveness of width optimization. In summary, for fashion-mnist such applications, an optimized width convolutional network architecture achieves the best accuracy-to-efficiency balance. We empirically prove that for image classification tasks, model selection and light design are significantly influenced by adopting appropriate architectural optimizations.
- Research Article
- 10.31577/cai_2020_6_1229
- Jan 1, 2020
- Computing and Informatics
During the past decades, numerous memory models have been proposed, which focused mainly on how spoken words are studied, whereas models on how visual images are studied are still limited. In this study, we propose a probabilistic memory model (PMM) for visual images categorization which is able to mimic the workings of the human brain during the image storage and retrieval. First, in the learning phase, the visual images are represented by the feature vectors extracted with convolutional neural network (CNN) and each feature component is assumed to conform to a Gaussian distribution and may be incompletely copied with a certain probability or randomly produced in accordance to an exponential distribution. Then, in the test phase, the likelihood ratio between the test image and each studied image is calculated based on the probabilistic inference theory, and an odd value in favor of an old item over a new one is obtained based on all likelihood values. Finally, if the odd value is above a certain threshold, the Bayesian decision rule is applied for image classification. Experimental results on two benchmark image datasets demonstrate that the proposed PMM can perform well on categorization tasks for both studied and non-studied images.
- Conference Article
3
- 10.1109/ecice52819.2021.9645706
- Oct 29, 2021
Deep convolutional neural network is one of the most popular research topics in the field of computer vision. It has the function of extracting image feature information, has strong nonlinear classification ability, fast learning speed, and can be used for image recognition and classification. This paper makes use of its image recognition and classification function to carry on the research of its recognition and classification technology in oil painting schools. Through the ResNet network structure of a deep convolutional neural network, a data set is constructed by load data function, and then embedded into a SEBlock model, the accuracy and generalization ability of image recognition and classification of the deep convolutional neural network can be greatly improved. Among them, the SE model has strong effectiveness and generalization ability. For example, the accuracy of the SE-ResNet-34 is 1.73% higher than that of the ResNet-34, and the accuracy of the SE-ResNet-50 has reached that of the ResNet-101. The SE model is applied to the deep convolutional neural network to improve classification accuracy and reduce errors.
- Research Article
4
- 10.1051/matecconf/202032111084
- Jan 1, 2020
- MATEC Web of Conferences
Recent advances in machine learning and image recognition tools/methods are being used to address fundamental challenges in materials engineering, such as the automated extraction of statistical information from dual phase titanium alloy microstructure images to support rapid engineering decision making. Initially, this work was performed by extracting dense layer outputs from a pretrained convolutional neural network (CNN), running the high dimensional image vectors through a principal component analysis, and fitting a logistic regression model for image classification. Kfold cross validation results reported a mean validation accuracy of 83% over 19 different material pedigrees. Furthermore, it was shown that fine-tuning the pre-trained network was able to improve image classification accuracy by nearly 10% over the baseline. These image classification models were then used to determine and justify statistically equivalent representative volume elements (SERVE). Lastly, a convolutional neural network was trained and validated to make quantitative predictions from a synthetic and real, two-phase image datasets. This paper explores the application of convolutional neural networks for microstructure analysis in the context of aerospace engineering and material quality.
- Conference Article
4
- 10.1109/avss.2019.8909826
- Sep 1, 2019
Convolutional neural networks (CNNs) have become the power method for many computer vision applications, including image classification and action recognition. However, they are almost computationally and memory intensive, thus are challenging to use and to deploy on systems with limited resources, except for a few recent networks which were specifically designed for mobile and embedded vision applications such as MobileNet, NASNet-Mobile. In this paper, we present a novel efficient algorithm to compress CNN models to decrease the computational cost and the run-time memory footprint. We propose a strategy to measure the redundancy of parameters based on their relationship using the covariance and correlation criteria, and then prune the less important ones. Our method directly applies to CNNs, both on convolutional and fully connected layers, and requires no specialized software/hardware accelerators. The proposed method significantly reduces the model sizes (up to 70%) and thus computing costs without performance loss on different CNN models (AlexNet, ResNet, and LeNet) for image classification on different datasets (MNIST, CIFAR10, and ImageNet) as well as for human action recognition (on dataset like the UCF101).