- Research Article
- 10.5565/rev/elcvia.2193
- Nov 20, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Doni Rizqi Setiawan + 4 more
Corn is one of Indonesia's main food ingredients that contains the second largest source of carbohydrates after rice. Classification of the type and quality of corn seeds is still conducted manually by farmers. This procedure is time-consuming and can result in inaccuracies in sorting. Morphology has important characteristics to determine varieties such as size, color, area and seed shape. Some of these attributes, if measured manually, will take a long time and complexity that requires special expertise. The right way to describe these characteristics is to utilize machine learning. The machine learning used is CNN (Convolutional Neural Network). The CNN models used are ResNet101, Resnet50, VGG-19 and MobileNetV2. An analysis of the performance of the model was carried out using a confusion matrix. The results of the CNN model performance parameters for the classification of corn seed varieties with the ResNet101 model showed an accuracy of 89.8%, a precision of 86.9%, a recall of 88.3% and an F1-score of 86.4%. The ResNet50 model showed an accuracy of 86.27%, a precision of 83.2%, a recall of 84.1% and an F1-score of 83.4%. While the VGG-19 model showed an accuracy of 76.47%, a precision of 66.8%, a recall of 78.% and an F1-score of 71.1%. Meanwhile, the MobileNetV2 model showed an accuracy of 73.34%, a precision of 69%, a recall of 69.8% and an F1-score of 69.8%.
- Research Article
- 10.5565/rev/elcvia.2023
- Nov 20, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Prabodh Kumar Sahoo + 3 more
The ability to detect moving objects is of great importance in a wide range of visual surveillance systems, playing a vital role in maintaining security and ensuring effective monitoring. However, the primary aim of such systems is to detect objects in motion and tackle real-world challenges effectively. Despite the existence of numerous methods, there remains room for improvement, particularly in slowly moving video sequences and unfamiliar video environments. In videos where slow-moving objects are confined to a small area, it can cause many traditional methods to fail to detect the entire object. However, an effective solution is the spatial-temporal framework. Additionally, the selection of temporal, spatial, and fusion algorithms is crucial for effectively detecting slow-moving objects. This article presents a notable effort to address the detection of slowly moving objects in challenging videos by leveraging an encoder-decoder architecture incorporating a modified VGG-16 model with a feature pooling framework. Several novel aspects characterize the proposed algorithm: it utilizes a pre-trained modified VGG-16 network as the encoder, employing transfer learning to enhance model efficacy. The encoder is designed with a reduced number of layers and incorporates skip connections to extract essential fine and coarse-scale features crucial for local change detection. The feature pooling framework (FPF) utilizes a combination of different layers including max pooling, convolutional, and numerous atrous convolutional with varying rates of sampling. This integration enables the preservation of features at different scales with various dimensions, ensuring their representa tion across a wide range of scales. The decoder network comprises stacked convolutional layers effectively mapping features to image space. The performance of the developed technique is assessed in comparison to various existing methods, including those by CMRM, Hybrid algorithm, Fast valley, EPMCB, and MODCVS, showcasing its effectiveness through both subjective and objective analyses. It demonstrates superior performance, with an average F-measure (AF) value of 98.86% and a lower average misclassification error (AMCE) value of 0.85. Furthermore, the algorithm’s effectiveness is validated on Imperceptible Video Configuration video setups, where it exhibits superior performance.
- Research Article
- 10.5565/rev/elcvia.1909
- Nov 20, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Shahi D + 2 more
High capacity steganography is still challenging today in the field of information security. The demandfor the exact retrieval of the cover media from stego-image after the extraction of secret data is also increasing.Using reversible information hiding techniques, the cover image can be recovered at the time of extraction ofsecret messages. Two techniques are proposed in this paper. In the first technique, the image is interpolated usinga new interpolation technique and the second technique uses a High Capacity Reversible Steganography usingMulti-layer Embedding (CRS) method for image interpolation. In both the techniques, the secret data areembedded in the cover image by Exclusive OR (XOR) operation. The proposed techniques give high embeddingcapacity and preserve image quality. The experimental results show that the proposed techniques offer betterresults over the existing techniques.
- Research Article
- 10.5565/rev/elcvia.1597
- Oct 18, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Sakshiwala + 1 more
This paper reviews state-of-the-art literature on the early diagnosis of lung cancer with deep neural network techniques and chest CT scans. First, a brief introduction to the significance of lung cancer and the need for this review is stated. The architectures of the deep neural networks, evaluation methods, and the comprehensive review of recent progress in lung cancer diagnosis based on deep neural network techniques are provided. Further, the comparative analysis of the literature is presented. A critical discussion on the existing datasets, various methodologies, and challenges in the diagnosis are presented. The performances of deep neural network-based techniques for segmentation, nodule detection, and nodule classification are also discussed. This review covers the malignancy classification along with the nodule detection tasks. Thus, this may provide necessary information to all the researchers to prepare a robust methodology for early detection of lung cancer and hence proper diagnosis.
- Research Article
- 10.5565/rev/elcvia.1827
- Oct 18, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Youcef Attallah
Hyperspectral remote sensing has emerged as a powerful tool for vegetation classification due to its ability to capture detailed spectral information. This study introduces a novel methodology for vegetation classification using exclusively hyperspectral imagery. The proposed approach comprises atmospheric correction using the FLAASH algorithm, followed by dimensionality reductionusing PCA and segmentation through the ROI selection and the Spectral Angle Mapper (SAM) module. Subsequently, a deep autoencoder is employed for feature extraction, paving the way for classification using the Multi-Layer Perceptron (MLP) algorithm. The effectiveness of this methodology is evaluated using a hyperspectral image of the Saint Clair River, successfully classifying the image into six main classes: water 1, water 2, grass, tree, reed, corn, and an 'unclassified' category encompassing concrete, roads, bricks, wood, and more. Our findings demonstrate the efficacy of this approach in accurately classifying and mapping vegetation in river ecosystems, offering a promising solution in the face of limited hyperspectral datasets.
- Journal Issue
- 10.5565/rev/elcvia.2422025
- Oct 18, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Research Article
- 10.5565/rev/elcvia.2043
- May 21, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Anjali S + 1 more
Anomaly detection in video is essential for applications like surveillance, healthcare, and industrial monitoring. Through the reconstruction of normal patterns and the computation of reconstruction error in relation to ground truth, convolutional autoencoders detect anomalies. Frames with errors above a threshold are flagged as abnormal. Existing approaches rely on fixed thresholds, which may not adapt well to varying lighting conditions, leading to false positives or missed anomalies. A novel autoencoder (SESAA) is proposed in this work that combines self-attention with squeeze-and-excitation (SE) blocks and improves video anomaly detection by using a thresholding technique for optimal threshold identification. Our adaptive thresholding technique leverages reconstruction cost, peak signal-to-noise ratio (PSNR) and frame brightness for optimal threshold identification, enhancing adaptability to different scenarios. Comparing with dynamic threshold methods, we assess our model using ROC and AUC metrics. Experiments on three benchmark datasets validate the efficacy of our method in precise anomaly detection through optimal thresholding.
- Research Article
- 10.5565/rev/elcvia.2020
- May 21, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Sakthi Priya G + 2 more
The proposed study explores the application of transfer learning techniques in bird species image classification, specifically focusing on the MobileNet and InceptionV3 models. Utilizing the CUB-200-2011 dataset, the transfer learning is employed to enhance classification accuracy. The MobileNet model achieved an impressive accuracy of 74.60%, outperforming InceptionV3, which recorded an accuracy of 64.00%. The corresponding loss values were 0.8685 for MobileNet and 1.128 for InceptionV3, highlighting MobileNet's superior alignment with actual class labels. Additionally, MobileNet demonstrated a precision range of 0.45 to 0.93, while InceptionV3's precision ranged from 0.65 to 0.81. The F1-scores revealed MobileNet's performance ranged from 0.40 to 0.91, in contrast to InceptionV3’s lower F1-scores, indicating a more stable but less effective classification ability. These findings underscore the potential of MobileNet as a lightweight, efficient alternative for wildlife image classification tasks, making it particularly suitable for deployment in resource-constrained environments. The developed user interface allows for seamless interaction, enabling users to upload images and receive immediate classification results, further demonstrating the practical application of these models in conservation and biodiversity preservation efforts.
- Research Article
- 10.5565/rev/elcvia.2009
- May 21, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Menaka Radhakrishnan
Maintaining optimal yield plays a crucial role in the prosperity of agriculture and in turn the economy of the country. One way to optimize this yield is by early and accurate detection and diagnosis of crop diseases. Traditional methods that involve manual inspection or the like tend to be tedious and often inaccurate. Hence the use of machine learning and convolutional neural networks have proven to be of great advantage in terms of accuracy, reliability, ease of implementation etc. This paper explores various deep learning models such as AlexNet, ResNet, Swin Transformer, Vgg-16, vit model for plant leaf disease detection and classification on a dataset of mango leaves and compares aspects such as accuracy and loss. Further the models have been combined using feature fusion, and their accuracies compared. Finally, a combination of ResNet and AlexNet has been proposed with an impressive accuracy of 99.97%. Further, Grad-CAM (Gradient-weighted Class Activation Mapping) has been implemented to highlight important regions in the leaf images which improves visualization. This can potentially provide an accurate identification and classification of plant diseases based on leaf images.
- Research Article
1
- 10.5565/rev/elcvia.1850
- May 8, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Vrinda Kore + 4 more
Sanskrit is widely acknowledged to be among the world’s oldest surviving classical languages, and yet its usage has continued to decline unabated in the present milieu. Such insidious erosion of popularity is directly attributable to the absence of native speakers of the language and the perceived inaccessibility of Sanskrit to contemporary audiences. Notwithstanding, the language remains historically and culturally inseparable from the subcontinent, with numerous religious manuscripts, epigraphical inscriptions, edicts and scientific literature written in the Sanskrit script. Attempts made to resuscitate the language have been largely unsuccessful as these attempts haverelied extensively on laborious human transcription and translation. Such manual endeavors can be superseded by the use of efficient computational techniques to facilitate the efficient transcription of voluminous manuscripts written in the Sanskrit script. The emergence of deep learning frameworks has enabled researchers to overcome the draw backs of conventional machine learning algorithms in developing efficient and extensible character recognition systems. Notwithstanding, the advancement of character recognition frameworks varies across different Indic scripts. In this context, this paper introduces an extensible framework for the transcription of hand written Sanskrit manuscripts. In the absence of a benchmark dataset of handwritten Sanskrit characters, the authors introduce a comprehensive dataset to facilitate further downstream segmentation. The dataset, on augmentation, comprises over a hundred thousand samples and has been collected from over a hundred individuals. The paper explores an integrated approach to segmentation and accordingly delineates a systematic methodology for effectively segmenting Sanskrit words, incorporating techniques such as thresholding, zone-based classification, median bisectionand projection profiles. The proposed technique accommodates a diverse array of characters and modifiers present in the Sanskrit script. Subsequently, a concurrent deep learning architecture parallelizes transcription using Neural Networks (CNN and Residual Networks). The deep learning models show accuracies exceeding 90%. This paper attempts to benchmark the significance ofsystematic approaches to machine transcription of low-resource languages.