- New
- Research Article
- 10.5565/rev/elcvia.2297
- Mar 4, 2026
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Hebron Prasetya + 5 more
Sea turtle species identification is vital for marine biodiversity conservation, as sea turtles impact marine ecosystem balance by consuming dead seagrass and maintaining coral reefs. They help preserve the health of seagrass beds and coral reefs that benefit commercially valuable species. Therefore, to sustain sea turtle populations, detection systems that facilitate conservation efforts are essential. In developing underwater detection models, researchers must address several challenges specific to the underwater environment, including low illumination conditions, complex backgrounds, and underwater blur effects. In addition, YOLOv10-nano has emerged as the most efficient object detector in its family, though improving its performance remains a challenge. To overcome this issue, we propose an advanced deep learning approach using modified YOLOv10-nano with a new Parallel Fusion Module (PFM) integrated into the backbone alongside self-attention to enhance detection performance, named TurtleNet. The Parallel Fusion Module enhances detection performance by capturing channel-wise representational features. It emphasizes channels with relevant information through a dual-scaling process, improving feature quality. PFM is integrated into the untouched branch of the Partial Self-Attention mechanism to enrich the split half of the feature channels. Our model uses 48,302 images from Bunaken National Marine Park containing Green, Hawksbill, and Olive Ridley turtles with data augmentation applied. The method leverages YOLOv10-nano's real-time detection capabilities while the PFM optimizes feature fusion and localization accuracy. Experimental results show our model achieves an mAP50 score of 0.856 and runs at 28 FPS on CPU devices, outperforming existing approaches in precision, recall, and efficiency. This research combines computer vision with marine biology, creating an automated system that helps researchers and conservationists monitor endangered turtles.
- New
- Research Article
- 10.5565/rev/elcvia.2217
- Mar 4, 2026
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Amer Tabbahk + 1 more
This paper presents a novel approach of reconstructing topology of a deep learning model to reduce model’s trainable parameters, called Binary Feature Map-Splitting Architecture (BFMSA). The proposed approach is trained using the PlantVillage dataset for plant disease classification. A simple CNN-based BFMSA and various pre-trained models, such as InceptionV3, ResNet50, VGG19, and VGG16 models based on BFMSA, are experimented. The research has two main contributions. First, reducing the computational cost while building a CNN model from scratch based on BFMSA, where the reduction would bein the feature extraction and classification phase. Second, reducing the computational cost while building a transfer learning model, and the reduction would be in the classification phase. The study compares the proposed architecture with traditional architecture and evaluates performance using various metrics such as accuracy, loss, F1-score, precision, and recall. The findings indicate reduced overfitting and improved validation accuracy in the proposed architecture. The CNN model-based BFMSA achieved the highest accuracy of 98.31% on the validation set in comparison with traditional architecture. Whereas VGG16-basedBFMSA achieved the highest accuracy among transfer learning models based BFMSA with a validation accuracy of 97.32%. Additionally, the proposed architecture decreases the trainable parameters by up to 87% compared to traditional models.
- New
- Research Article
- 10.5565/rev/elcvia.2132
- Mar 3, 2026
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Mainak Bandyopadhyay + 2 more
The increase in technological advancements in unmanned ariel vehicle has lead to the challenges in the detection of drones in flight. The micro Doppler signatures obtained from radars is used to distinguish and detect different types of drones. Due to relatively similar radar spectogram image patterns or micro-Doppler signatures it is sometimes very challenging to classify different types of drones. Previously, Deep Learning methods like transfer learning and residual networks have been proposed to improve the classification accuracy. For further improving the classifying efficiency , this paper investigates the integration of channel attention mechanisms i.e. Squeeze and Excitation Net, Efficient Channel Attention and Gated Channel Transformation in the custom CNN Network (UAVDetect) with three publicly available micro Doppler spectogram UAV datasets. The paper proposes Modified SENet and Modified ECA which further improves the accuracy and better convergence.
- New
- Research Article
- 10.5565/rev/elcvia.2147
- Mar 2, 2026
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Takwa Ben Aïcha Gader + 4 more
Head and neck (HaN) cancer is a common type of cancer. Radiotherapy is used to treat this cancer by targeting cancerous cells while avoiding healthy organs. A precise description of targeted areas and adjacent organs at risk (OARs) is required, which is done using Computed Tomography (CT) images. However, some OARs in the head and neck region are better observed in magnetic resonance (MR) images. Therefore, we propose a fully automated system for OAR segmentation using CT images and other imaging modalities. More specifically, we want to use the patient's CT and MR images to identify 30 organs that may be at risk. We proposed 3D-UNet, a model for volumetric segmentation that accurately captures spatial relationships. The model has to skip connections for feature propagation, improving segmentation. In addition, it can handle multimodal inputs to integrate complementary imaging information for more precise segmentation. Our proposed model achieved a training accuracy of 94.2% and a test accuracy of 94.1% which competes with the related works.
- New
- Research Article
- 10.5565/rev/elcvia.2019
- Mar 2, 2026
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Aouadi Nabil
This paper addresses the problem of touching characters (TC) extraction and separation in Arabic manuscripts. It proposes a recognition based method to separate them and join each component piece to its corresponding word. The proposed method extracts the TCs in the document whether between successive text-lines or words of the same text-line. For vertically/up-down TCs, we improved an existing method. For horizontally /left-right TCs, we proposed a novel extraction method based on the morphology analysis of the terminal letters of Arabic words. Then, it recognizes the TCs relying on templates, using shape context descriptor and an interpolation function the TPS transformation (Thin Plate Spline). Finally, it segments them based on the distance from the central points (midpoints or gravity centers) of the recognized template’s parts. Tests are performed using a large dataset of TCs and three metrics: Manhattan, Euclidean and Canberra distances. Obtained results strongly support the efficiency of the proposed TC extraction and segmentation methods and outperform results of some related works, taking into account the different types, variability and complexity of the TCs.
- New
- Research Article
- 10.5565/rev/elcvia.1804
- Feb 17, 2026
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Diwakar Diwakar + 1 more
Chest disorders are widespread globally, encompassing conditions such as COVID-19, pneumonia, tuberculosis, and fibrosis. The diagnostic process often relies on chest X-ray (CXR) images, given the similarities in symptoms among these diseases. Manual diagnosis is a laborious and challenging endeavor due to the shared characteristics of these disorders. In contrast, leveraging deep learning technologies offers a more efficient and cost-effective approach to analyze CXR images for diagnostic purposes. This paper introduces an integrated model, utilizing both VGG16 and VGG19 architectures, coupled with Principal Component Analysis (PCA) and a feature fusion technique for the classification of multiple diseases. The model encompasses four classes: COVID-19, normal, pneumonia, and tuberculosis, making it suitable for real-time applications. The dataset employed in this study is sourced from the Kaggle repository. Our proposed model achieves an accuracy of 97.50\%, with a training time of approximately 4 seconds. Comparative analyses with other existing models are conducted to validate the effectiveness of the proposed approach.
- Research Article
- 10.5565/rev/elcvia.2193
- Nov 20, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Doni Rizqi Setiawan + 4 more
Corn is one of Indonesia's main food ingredients that contains the second largest source of carbohydrates after rice. Classification of the type and quality of corn seeds is still conducted manually by farmers. This procedure is time-consuming and can result in inaccuracies in sorting. Morphology has important characteristics to determine varieties such as size, color, area and seed shape. Some of these attributes, if measured manually, will take a long time and complexity that requires special expertise. The right way to describe these characteristics is to utilize machine learning. The machine learning used is CNN (Convolutional Neural Network). The CNN models used are ResNet101, Resnet50, VGG-19 and MobileNetV2. An analysis of the performance of the model was carried out using a confusion matrix. The results of the CNN model performance parameters for the classification of corn seed varieties with the ResNet101 model showed an accuracy of 89.8%, a precision of 86.9%, a recall of 88.3% and an F1-score of 86.4%. The ResNet50 model showed an accuracy of 86.27%, a precision of 83.2%, a recall of 84.1% and an F1-score of 83.4%. While the VGG-19 model showed an accuracy of 76.47%, a precision of 66.8%, a recall of 78.% and an F1-score of 71.1%. Meanwhile, the MobileNetV2 model showed an accuracy of 73.34%, a precision of 69%, a recall of 69.8% and an F1-score of 69.8%.
- Research Article
- 10.5565/rev/elcvia.2023
- Nov 20, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Prabodh Kumar Sahoo + 3 more
The ability to detect moving objects is of great importance in a wide range of visual surveillance systems, playing a vital role in maintaining security and ensuring effective monitoring. However, the primary aim of such systems is to detect objects in motion and tackle real-world challenges effectively. Despite the existence of numerous methods, there remains room for improvement, particularly in slowly moving video sequences and unfamiliar video environments. In videos where slow-moving objects are confined to a small area, it can cause many traditional methods to fail to detect the entire object. However, an effective solution is the spatial-temporal framework. Additionally, the selection of temporal, spatial, and fusion algorithms is crucial for effectively detecting slow-moving objects. This article presents a notable effort to address the detection of slowly moving objects in challenging videos by leveraging an encoder-decoder architecture incorporating a modified VGG-16 model with a feature pooling framework. Several novel aspects characterize the proposed algorithm: it utilizes a pre-trained modified VGG-16 network as the encoder, employing transfer learning to enhance model efficacy. The encoder is designed with a reduced number of layers and incorporates skip connections to extract essential fine and coarse-scale features crucial for local change detection. The feature pooling framework (FPF) utilizes a combination of different layers including max pooling, convolutional, and numerous atrous convolutional with varying rates of sampling. This integration enables the preservation of features at different scales with various dimensions, ensuring their representa tion across a wide range of scales. The decoder network comprises stacked convolutional layers effectively mapping features to image space. The performance of the developed technique is assessed in comparison to various existing methods, including those by CMRM, Hybrid algorithm, Fast valley, EPMCB, and MODCVS, showcasing its effectiveness through both subjective and objective analyses. It demonstrates superior performance, with an average F-measure (AF) value of 98.86% and a lower average misclassification error (AMCE) value of 0.85. Furthermore, the algorithm’s effectiveness is validated on Imperceptible Video Configuration video setups, where it exhibits superior performance.
- Research Article
- 10.5565/rev/elcvia.1909
- Nov 20, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Shahi D + 2 more
High capacity steganography is still challenging today in the field of information security. The demandfor the exact retrieval of the cover media from stego-image after the extraction of secret data is also increasing.Using reversible information hiding techniques, the cover image can be recovered at the time of extraction ofsecret messages. Two techniques are proposed in this paper. In the first technique, the image is interpolated usinga new interpolation technique and the second technique uses a High Capacity Reversible Steganography usingMulti-layer Embedding (CRS) method for image interpolation. In both the techniques, the secret data areembedded in the cover image by Exclusive OR (XOR) operation. The proposed techniques give high embeddingcapacity and preserve image quality. The experimental results show that the proposed techniques offer betterresults over the existing techniques.
- Research Article
- 10.5565/rev/elcvia.1597
- Oct 18, 2025
- ELCVIA Electronic Letters on Computer Vision and Image Analysis
- Sakshiwala + 1 more
This paper reviews state-of-the-art literature on the early diagnosis of lung cancer with deep neural network techniques and chest CT scans. First, a brief introduction to the significance of lung cancer and the need for this review is stated. The architectures of the deep neural networks, evaluation methods, and the comprehensive review of recent progress in lung cancer diagnosis based on deep neural network techniques are provided. Further, the comparative analysis of the literature is presented. A critical discussion on the existing datasets, various methodologies, and challenges in the diagnosis are presented. The performances of deep neural network-based techniques for segmentation, nodule detection, and nodule classification are also discussed. This review covers the malignancy classification along with the nodule detection tasks. Thus, this may provide necessary information to all the researchers to prepare a robust methodology for early detection of lung cancer and hence proper diagnosis.