Bri3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception
Visual illusions play a significant role in understanding visual perception. Current methods in understanding and evaluating visual illusions are mostly deterministic filtering based approach and they evaluate on a handful of visual illusions, and the conclusions therefore, are not generic. To this end, we generate a large-scale dataset of 22,366 images (BRI3L: BRightness Illusion Image dataset for Identification and Localization of illusory perception) of the five types of brightness illusions and benchmark the dataset using data-driven neural network based approaches. The dataset contains label information -(1) whether a particular image is illusory/nonillusory, (2) the segmentation mask of the illusory region of the image. Hence, both the classification and segmentation task can be evaluated using this dataset. We follow the standard psychophysical experiments involving human subjects to validate the dataset. To the best of our knowledge, this is the first attempt to develop a dataset of visual illusions and benchmark using data-driven approach for illusion classification and localization. We consider five well-studied types of brightness illusions: 1) Hermann grid, 2) Simultaneous Brightness Contrast, 3) White illusion, 4) Grid illusion, and 5) Induced Grating illusion. Benchmarking on the dataset achieves 99.56% accuracy in illusion identification and 84.37% pixel accuracy in illusion localization. The application of deep learning model, it is shown, also generalizes over unseen brightness illusions like brightness assimilation to contrast transitions. We also test the ability of state-of-theart diffusion models to generate brightness illusions. We have provided all the code, dataset, instructions etc in the github repo: https://github.com/aniket004/BRI3L
- Research Article
43
- 10.4028/www.scientific.net/jbbbe.42.79
- Jul 1, 2019
- Journal of Biomimetics, Biomaterials and Biomedical Engineering
In this paper, a modified adaptive K-means (MAKM) method is proposed to extract the region of interest (ROI) from the local and public datasets. The local image datasets are collected from Bethezata General Hospital (BGH) and the public datasets are from Mammographic Image Analysis Society (MIAS). The same image number is used for both datasets, 112 are abnormal and 208 are normal. Two texture features (GLCM and Gabor) from ROIs and one CNN based extracted features are considered in the experiment. CNN features are extracted using Inception-V3 pre-trained model after simple preprocessing and cropping. The quality of the features are evaluated individually and by fusing features to one another and five classifiers (SVM, KNN, MLP, RF, and NB) are used to measure the descriptive power of the features using cross-validation. The proposed approach was first evaluated on the local dataset and then applied to the public dataset. The results of the classifiers are measured using accuracy, sensitivity, specificity, kappa, computation time and AUC. The experimental analysis made using GLCM features from the two datasets indicates that GLCM features from BGH dataset outperformed that of MIAS dataset in all five classifiers. However, Gabor features from the two datasets scored the best result with two classifiers (SVM and MLP). For BGH and MIAS, SVM scored an accuracy of 99%, 97.46%, the sensitivity of 99.48%, 96.26% and specificity of 98.16%, 100% respectively. And MLP achieved an accuracy of 97%, 87.64%, the sensitivity of 97.40%, 96.65% and specificity of 96.26%, 75.73% respectively. Relatively maximum performance is achieved for feature fusion between Gabor and CNN based extracted features using MLP classifier. However, KNN, MLP, RF, and NB classifiers achieved almost 100% performance for GLCM texture features and SVM scored an accuracy of 96.88%, the sensitivity of 97.14% and specificity of 96.36%. As compared to other classifiers, NB has scored the least computation time in all experiments.
- Research Article
14
- 10.1177/11206721221096294
- Apr 25, 2022
- European Journal of Ophthalmology
The aim of the study is to improve the accuracy of age related macular degeneration (AMD) disease in its earlier phases with proposed Capsule Network (CapsNet) architecture trained on speckle noise reduced spectral domain optical coherence tomography (SD-OCT) images based on an optimized Bayesian non-local mean (OBNLM) filter augmentation techniques. A total of 726 local SD-OCT images were collected and labelled as 159 drusen, 145 dry AMD, 156 wet AMD and 266 normal. Region of interest (ROI) was identified. Speckle noise in SD-OCT images were reduced based on OBNLM filter. The processed images were fed to proposed CapsNet architecture to clasify SD-OCT images. Accuracy rates were calculated in both public and local dataset. Accuracy rate of local SD-OCT image dataset classification was achieved to a value of 96.39% after performing data augmentation and speckle noise reduction with OBNLM. The performance of proposed CapsNet was also evaluated on the public Kaggle dataset under the same processing procedures and the accuracy rate was calculated as 98.07%. The sensitivity and specificity rates were 96.72% and 99.98%, respectively. The classification success of proposed CapsNet may be improved with robust pre-processing steps like; determination of ROI and denoised SD-OCT images based on OBNLM. These impactful image preprocessing steps yielded higher accuracy rates for determining different types of AMD including its precursor lesion on the both local and public dataset with proposed CapsNet architecture.
- Book Chapter
- 10.1007/978-981-99-0923-0_29
- Jan 1, 2023
Macro-fungi, usually called mushrooms, play a significant role in the ecosystem and share ubiquitous ecological niches in nature for their huge populations of over a million species. However, traditional classification approaches for mushrooms required professional taxonomic knowledge, and thus significant financial investment hinders its development with only ca. 20,000 species found now and only general applications employed such as poisonous mushroom identification. In this study, we creatively proposed an approach for automatic mushroom image recognition based on a deep convolutional neural network (DCNN). Attention mechanisms were combined with an efficient lightweight MobileNetV3 backbone network to achieve high performance on the mushroom image classification task. Our models achieve the highest 81.92% test accuracy on the public mushroom image dataset and 70.73% test accuracy on the local mushroom image dataset. Moreover, self-attention-based transformers are compared with lightweight DCNNs implementing attention mechanisms but do not achieve satisfying performance either on public or local datasets, which highlights the advantages of DCNNs for fine-grained biological image recognition. The proposed approach has demonstrated great potential for real-time and automatic mushroom image processing and the proposed automatic procedure will be complementary and a useful reference to traditional mushroom classification.
- Research Article
- 10.21928/uhdjst.v5n2y2021.pp26-31
- Aug 5, 2021
- UHD Journal of Science and Technology
Face recognition is an extreme topic in security field which identifies humans through physiological or behavioral biometric characteristics. Face recognition can also identify the human almost in a precise detection; one of the primary problems in face recognition is the accurate recognition rate. Local datasets use for implementing this research rather than using public datasets. Midian filter uses to remove noise and identify errors, also obtains a good accuracy rate without modifying image quality. In addition, filter processing applies to modify and progress images and the discrete wavelet transforms algorithm uses as feature extraction. Many steps are applied in this approach such as image acquisition, converting images into gray scale, cropping the image, and then passing to the feature extraction. In order to get the final decision about the indicated face, some required steps are used in the comparison. The results show the accuracy of 91% of the recognition rate through the human face.
- Research Article
2
- 10.5194/isprs-archives-xlii-2-w13-785-2019
- Jun 5, 2019
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. With the rapid development of new indoor sensors and acquisition techniques, the amount of indoor three dimensional (3D) point cloud models was significantly increased. However, these massive “blind” point clouds are difficult to satisfy the demand of many location-based indoor applications and GIS analysis. The robust semantic segmentation of 3D point clouds remains a challenge. In this paper, a segmentation with layout estimation network (SLENet)-based 2D–3D semantic transfer method is proposed for robust segmentation of image-based indoor 3D point clouds. Firstly, a SLENet is devised to simultaneously achieve the semantic labels and indoor spatial layout estimation from 2D images. A pixel labeling pool is then constructed to incorporate the visual graphical model to realize the efficient 2D–3D semantic transfer for 3D point clouds, which avoids the time-consuming pixel-wise label transfer and the reprojection error. Finally, a 3D-contextual refinement, which explores the extra-image consistency with 3D constraints is developed to suppress the labeling contradiction caused by multi-superpixel aggregation. The experiments were conducted on an open dataset (NYUDv2 indoor dataset) and a local dataset. In comparison with the state-of-the-art methods in terms of 2D semantic segmentation, SLENet can both learn discriminative enough features for inter-class segmentation while preserving clear boundaries for intra-class segmentation. Based on the excellence of SLENet, the final 3D semantic segmentation tested on the point cloud created from the local image dataset can reach a total accuracy of 89.97%, with the object semantics and indoor structural information both expressed.
- Research Article
- 10.21928/uhdjst.v5n2y2021.pp26-37
- Aug 5, 2021
- UHD Journal of Science and Technology
Face recognition is an extreme topic in security field which identifies humans through physiological or behavioral biometric characteristics. Face recognition can also identify the human almost in a precise detection; one of the primary problems in face recognition is the accurate recognition rate. Local datasets use for implementing this research rather than using public datasets. Midian filter uses to remove noise and identify errors, also obtains a good accuracy rate without modifying image quality. In addition, filter processing applies to modify and progress images and the discrete wavelet transforms algorithm uses as feature extraction. Many steps are applied in this approach such as image acquisition, converting images into gray scale, cropping the image, and then passing to the feature extraction. In order to get the final decision about the indicated face, some required steps are used in the comparison. The results show the accuracy of 91% of the recognition rate through the human face.
- Conference Article
7
- 10.1145/3441233.3441235
- Oct 9, 2020
Many municipalities and local road authorities seek to implement automated evaluation of road damage. However, they often lack technology, know-how, and funds to afford state-of-the-art data collection equipment for collection and analysis of road deficiencies. The paper describes the development of a localized road damage detection model using the transfer learning method and assessment of its usability for training the detection model from a local road image dataset of a limited size. Localized road damage dataset is created by capturing 3,923 Czech and Slovak road images containing 5,072 instances of detected road damage using a smartphone installed on the vehicle's windshield. Then, a supervised neural network was trained using the road damage dataset labeled by experts. A pre-trained MobileNet model developed by the University of Tokyo and transfer learning method were employed to accelerate the training process and to improve the model's performance when a relatively small, localized dataset is used. Finally, the performance of the developed road damage detection model was analyzed. The results show that it is possible to capture road damage into preset classes with accuracy based on the F1-score ranging between 45% and 98%. Further improvement in the detection rate can be achieved by increasing the training dataset size. The developed road damage detection model is publicly available on https://github.com/amraz39/RoadDamage DetectorCZ and it shows the high potential of employing deep neural networks in the detection of road damage by local road agencies.
- Research Article
3
- 10.1016/j.medengphy.2024.104162
- Mar 29, 2024
- Medical engineering & physics
Automatic left ventricle volume and mass quantification from 2D cine-MRI: Investigating papillary muscle influence
- Research Article
8
- 10.1080/2150704x.2013.781286
- Jul 1, 2013
- Remote Sensing Letters
Mobile apps in information communication technology is regarded as the dominant trend from many application fields covering science and engineering. However, most mobile apps in the application communities dealing with satellite image data sets are somewhat on the limited level, and it is generally the case that air-photos or high-resolution satellite images are used for the background for other contents services. Along with the current status and latent possibility of mobile analytics, it is necessary to design and implement a sort of practical mobile app for satellite images utilization. The mobile device used in this study is a android-driven smartphone. The main functionalities with mobile apps in this study are geo-metadata searching and browsing interlinked with embedded database, vector layer overlay from users’ local data sets, actual image processing by mobile gesture requesting and geo-database server connection. Especially, examples for satellite image processing provided by this approach emphasize the filtering or detection level such as line segmentation, edge detection and corner point detection. It is practicable to implement its extension to customize and optimize the target applications along with more users’ requirements. Furthermore, all these tasks in both mobile client and server are carried out on open source stacks. It is expected that this study can be a meaningful attempt or an application model to develop more practical mobile apps for remote sensing images and geo-based contents.
- Research Article
11
- 10.1016/j.inffus.2023.102181
- Dec 5, 2023
- Information Fusion
AFPILD: Acoustic footstep dataset collected using one microphone array and LiDAR sensor for person identification and localization
- Research Article
20
- 10.1007/s00521-021-05908-9
- Mar 24, 2021
- Neural Computing and Applications
Medical image segmentation plays an important role in many clinical medicines, such as medical diagnosis and computer-assisted treatment. However, due to the large quality differences, variable lesion areas and their complex shapes, medical image segmentation is a very challenging task. However, most of the recent deep learning methods ignore the global context information as well as the receptive fields of pixels and do not consider the reuse of pixel features during the feature extraction stage. In this paper, we propose DGFAU-Net, an encoder–decoder structured 2D segmentation model, to overcome the shortcomings aforementioned. In the encoder, DenseNet and AtrousCNN networks are leveraged to extract image features. The DenseNet network is mainly used to achieve the reuse of pixel features, and AtrousCNN is utilized to enhance the receptive field of pixels. In the decoder, two modules, global feature attention upsample (GFAU) and pyramid pooling attention squeeze-excitation (PPASE), are proposed. GFAU combines low-level and high-level features to generate new features with richer information for improving the perceptions of global contextual information of pixels. PPASE employs a multi-scale pooling layer to enhance the pixel’s acceptance field. In addition, Focal loss is used to balance the difference between samples of the lesion and non-lesioned areas. Extensive experiments are conducted on one local dataset and two public datasets, which are the local dataset of MRI images of carotid plaque, DRIVE vessel segmentation dataset and CHASE_DB1 vessel segmentation dataset, and the experimental results demonstrate the effectiveness of our proposed model.
- Conference Article
3
- 10.1109/ist.2016.7738272
- Oct 1, 2016
Hypertensive retinopathy is an eye disease which causes to loss of human vision in intensive scenarios. Retinal blood vessels are highly affected by this disease. Ratio between diameters of arterioles and venules is the only measure to detect the abnormalities in vessels. The proposed system identify the symptoms of abnormal vascular structure like arteriolar narrowing. AVR calculation is the main part of the proposed system. Some of the steps in this process are background preprocessing, vessels segmentation, OD segmentation, vessels classification and vessels width calculation which estimates the AVR. This proposed system basically classify the vessels into arteries and veins which is the main part of the AVR measurements. Higher accuracy in vessels classification is observed by using the SVM as classifier. Accurate classification leads to the best grading of different stages of hypertensive retinopathy in fundus images. Three dataset of fundus images are tested by this algorithm which are INSPIRE-AVR, VICAVR and a local dataset. Results obtained after testing are compared with other techniques and shows the improvements.
- Research Article
12
- 10.1016/j.dib.2021.107116
- May 8, 2021
- Data in Brief
Dataset for localization and classification of Medjool dates in digital images
- Conference Article
97
- 10.1109/wacv.2018.00123
- Mar 1, 2018
This paper proposes a 5-component detection pipeline for use in a computer vision-based animal recognition system. The end result of our proposed pipeline is a collection of novel annotations of interest (AoI) with species and view-point labels. These AoIs, for example, could be fed as the focused input data into an appearance-based animal identification system. The goal of our method is to increase the reliability and automation of animal censusing studies and to provide better ecological information to conservationists. Our method is able to achieve a localization mAP of 81.67%, a species and viewpoint annotation classification accuracy of 94.28% and 87.11%, respectively, and an AoI accuracy of 72.75% across 6 animal species of interest. We also introduce the Wildlife Image and Localization Dataset (WILD), which contains 5,784 images and 12,007 labeled annotations across 28 classification species and a variety of challenging, real-world detection scenarios.
- Research Article
5
- 10.1093/mnras/stad3815
- Dec 15, 2023
- Monthly Notices of the Royal Astronomical Society
The Chinese Space Station Telescope (abbreviated as CSST) is a future advanced space telescope. Real-time identification of galaxy and nebula/star cluster (abbreviated as NSC) images is of great value during CSST survey. While recent research on celestial object recognition has progressed, the rapid and efficient identification of high-resolution local celestial images remains challenging. In this study, we conducted galaxy and NSC image classification research using deep learning methods based on data from the Hubble Space Telescope. We built a local celestial image data set and designed a deep learning model named HR-CelestialNet for classifying images of the galaxy and NSC. HR-CelestialNet achieved an accuracy of 89.09 per cent on the testing set, outperforming models such as AlexNet, VGGNet, and ResNet, while demonstrating faster recognition speeds. Furthermore, we investigated the factors influencing CSST image quality and evaluated the generalization ability of HR-CelestialNet on the blurry image data set, demonstrating its robustness to low image quality. The proposed method can enable real-time identification of celestial images during CSST survey mission.