ARID

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Instance segmentation algorithms are used everywhere, be it self driving cars, scene mapping by autonomous robots or analyzing medical scans. Instance segmentation can be thought of as further refinement of semantic segmentation. Object detection algorithms try to detect objects from the scene by enclosing them in bounding boxes, semantic segmentation tries to label these objects, whereas instance segmentation tries to label each unique instance of these objects. The task is quite complex and becomes even more challenging when the scope is microscopic data. Objects in microscopic data do not usually follow a fixed shape or orientation, therefore it becomes very difficult to identify unique instances of these objects using axis aligned bounding boxes. The alternative approach that researchers take is to do pixel wise prediction and then agglomerate those together to ultimately get the final object instances. In this thesis we presented a novel loss function which we have used to train a U-Net which predicts n-dimensional embedding maps or ARID(Affinity Representing Instance Descriptors). These embedding vectors contain dense information which can then be used to generate segmentation maps using the post processing approaches. Previous methods have attempted to learn affinities but are prone to errors resulting in erroneous segmentation. We show that our segmentation pipeline using ARID embedding map surpasses the performance of the affinity based networks and solve the problem of merge errors. Our segmentation pipeline have two phases, first one is predicting ARID embedding for which we have trained U-Net architecture using ultrametric loss. Multiple configurations were tested and compared. Second phase is post processing. Post processing is further divided in two steps segmentation generation and refinement. We presented a very basic technique to generate a euclidean minimum spanning tree and prune the edges with distance bigger than the provided threshold to generate segmentation. The other part of the post processing pipeline is segmentation refinement. Where we proposed approaches to refine the generated segmentation. We have used IOU scores under thresholds of Average Precision(AP) raging from 0.5 to 0.95 with an increment of 0.05 to evaluate the performance. The best average AP0.5 IOU score that we got from the affinity based networks is 0.63, we have shown that our segmentation pipeline generates the segmentation maps which gives the best average performance of 0.826 AP0.5 IOU score, surpassing the affinity based network performance. We have also shown the failure modes of our proposed loss function and presented future scope of research in the field. Embedding based approaches show promise to do efficient instance segmentation especially in complex scenes as is in the microscopic data. The generalized loss function that we have presented in this thesis is capable of doing this task, and presents a better alternative to using affinity based methods to do segmentation.--Author's abstract

Similar Papers
  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icassp39728.2021.9414312
Instance Segmentation with the Number of Clusters Incorporated in Embedding Learning
  • Jun 6, 2021
  • Jianfeng Cao + 1 more

Semantic and instance segmentation algorithms are two general yet distinct image segmentation solutions powered by Convolution Neural Network. While semantic segmentation benefits extensively from the end-to-end training strategy, instance segmentation is frequently framed as a multi-stage task, supported by learning-based discrimination and post-process clustering. Independent optimizations on substages instigate the accumulation of segmentation errors. In this work, we propose to embed prior clustering information into an embedding learning framework FCRNet, stimulating the one-stage instance segmentation. FCRNet relieves the complexity of post process by incorporating the number of clustering groups into the embedding space. The superior performance of FCRNet is verified and compared with other methods on the nucleus dataset BBBC006.

  • Research Article
  • Cite Count Icon 41
  • 10.1109/tim.2021.3121485
A Lightweight Adaptive RoI Extraction Network for Precise Aerial Image Instance Segmentation
  • Jan 1, 2021
  • IEEE Transactions on Instrumentation and Measurement
  • Xiangfeng Zeng + 3 more

Bounding boxes have been widely implemented into aerial object detection for its simplicity. They perform instance-level location with the coordinates and orientation for each target. But the defects such as coarse edge information impede semantic interpretation in earth observation. Besides, in terms of the aerial imaging instruments, it's essential to recognize the exterior appearance and contour of the objects. In this paper, we propose a novel aerial instance segmentation method termed adaptive RoI extraction network (ARE-Net) which bridges the gap of accurately delineating instances under the complex back-ground of aerial images. To exert instance segmentation under the proprietary property, e.g., complex background, densely distributed instances, of aerial images, RoIs are pooled from multi-level feature maps and integral region proposals. On this basis, global attention RoI extractor (GA-RoIE) and perceptual RoI extractor (PRoIE) are respectively introduced for detection branch and mask branch to perform adaptive RoI extraction for aerial images. Meanwhile, to reconcile the probability distribution regional distribution of pixel-wise prediction in aerial images, we present the Adaptive Compound loss function to improve the integrating degree of the predicted binary mask to ground truth mask. Additionally, we adopt RegNetx with Deformable Convolution to optimize ARE-Net, and name it as R-ARE-Net. Despite implementing pixel-wise prediction, comprehensive experiments on iSAID and NWPU VHR-10 instance segmentation dataset still have verified the effectiveness and efficiency of ARE-Net and R-ARE-Net. Experimental results indicate that our proposed methods receive the highest AP value (38.0% AP on iSAID and 64.2% AP on NWPU VHR-10 instance segmentation dataset) and lowest FLOPs and Parameters consumption (~46% reduced FLOPs and 61.5% reduced Parameters than SCNet) among the mainstream methods. Besides, the false alarms, missing segmentations, poorly predicted masks, and under-segmentations that appeared in the mainstream methods can be avoided to some extend for R-ARE-Net.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 30
  • 10.1109/access.2020.3003917
Syncretic-NMS: A Merging Non-Maximum Suppression Algorithm for Instance Segmentation
  • Jan 1, 2020
  • IEEE Access
  • Jun Chu + 4 more

Instance segmentation is typically based on an object detection framework. Semantic segmentation is conducted on the bounding boxes that are returned by detectors. NMS (non-maximum suppression) is a common post-processing operation in instance segmentation and object detection tasks. It is typically used after bounding box regression to eliminate redundant bounding boxes. The evaluation criteria for object detection require that the bounding box be as close as possible to the ground truth, but they do not emphasize the integrity of the included object. However, sometimes the bounding boxes cannot contain the complete objects, and the parts beyond the bounding boxes cannot be correctly predicted in the subsequent semantic segmentation. To solve this problem, we propose the Syncretic-NMS algorithm. The algorithm takes traditional NMS as the first step and processes the bounding boxes obtained by traditional NMS, judges the neighboring bounding boxes of each bounding box, and combines the neighboring boxes that are strongly correlated with the corresponding bounding boxes. The coordinates of the merged box are the four coordinate extremes of the bounding box and the highly relevant neighboring box. The neighboring box with strong correlation is merged with the corresponding bounding box. Based on an analysis of the influences of corresponding factors, the criteria for correlation judgment are specified. Experimental results on the MS COCO dataset demonstrate that Syncretic-NMS can steadily increase the accuracy of instance segmentation, while experimental results on the Cityscapes dataset prove that the algorithm can adapt to application scenario changes. The computational complexity of Syncretic-NMS is the same as that of traditional NMS. Syncretic-NMS is easy to implement, requires no additional training, and can be easily integrated into the available instance segmentation framework.

  • Conference Article
  • 10.1117/12.2622448
Combination of visual and semantic criteria for automated selection of region proposals in a bounding box
  • Mar 5, 2022
  • Mohamed-Hicham Leghettas + 3 more

Deep learning based techniques have been widely used for semantic segmentation. The underlying voluminous DNN models are trained on large datasets that have been annotated at the pixel level by humans. Such low-level annotation tasks are expensive to obtain for newly collected datasets. Alternatively, we propose ComViSe, a segmentation pipeline that requires only high-level annotations that remain relatively accessible (e.g., bounding boxes and labels of a detection, labels of a legend) to segment a given image. ComViSe embeds a segmentation framework, pre-trained on a semantically different dataset, to generate image region proposals. The pipeline relies then on several semantic, visual and geometric criteria to characterize each proposed region, and combines them to select the optimal segmentation mask, comparing diverse aggregation strategies from handcrafted formula to automatic ones, supervised or not. An experimental study conducted on the PASCAL VOC dataset shows that these effectively combined criteria are enough to select the mask proposals with the best IoU score in most cases, and that the aggregation can be done automatically.

  • Research Article
  • Cite Count Icon 15
  • 10.1162/neco_a_01416
Least kth-Order and Rényi Generative Adversarial Networks.
  • Aug 19, 2021
  • Neural Computation
  • Himesh Bhatia + 4 more

We investigate the use of parameterized families of information-theoretic measures to generalize the loss functions of generative adversarial networks (GANs) with the objective of improving performance. A new generator loss function, least kth-order GAN (LkGAN), is introduced, generalizing the least squares GANs (LSGANs) by using a kth-order absolute error distortion measure with k≥1 (which recovers the LSGAN loss function when k=2). It is shown that minimizing this generalized loss function under an (unconstrained) optimal discriminator is equivalent to minimizing the kth-order Pearson-Vajda divergence. Another novel GAN generator loss function is next proposed in terms of Rényi cross-entropy functionals with order α>0, α≠1. It is demonstrated that this Rényi-centric generalized loss function, which provably reduces to the original GAN loss function as α→1, preserves the equilibrium point satisfied by the original GAN based on the Jensen-Rényi divergence, a natural extension of the Jensen-Shannon divergence. Experimental results indicate that the proposed loss functions, applied to the MNIST and CelebA data sets, under both DCGAN and StyleGAN architectures, confer performance benefits by virtue of the extra degrees of freedom provided by the parameters k and α, respectively. More specifically, experiments show improvements with regard to the quality of the generated images as measured by the Fréchet inception distance score and training stability. While it was applied to GANs in this study, the proposed approach is generic and can be used in other applications of information theory to deep learning, for example, the issues of fairness or privacy in artificial intelligence.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/ispcc53510.2021.9609379
Infection Segmentation of Leaves Using Deep Learning techniques to enhance crop productivity in smart agriculture
  • Oct 7, 2021
  • Nikhitha Karennagari + 5 more

Agriculture has been playing a vital role in human existence. Several new techniques have been invented and discovered to increase crop productivity. Along with the increase in crop production, the problems related to disease/infection in the crop have also increased. The farmer may sometimes have low or no knowledge about the infection, or he may be in a situation where he couldn’t be able to identify the small traces of the infection that has been induced on the leaves. Infection classification, detection, and segmentation play a vital role in helping the farmers identify the infection at its budding stage and take the required remedies for it. The previous approaches made in this context were based on classification and detection only. These approaches have certain limitations, and they did not specify the infection in its exact proportion. To overcome the limitations of previous approaches, a segmentation approach can be used to accurately segment all infected spots of the infection on the leaves in their exact shape. Instance segmentation of the leaf disease helps to solve the overlapping bounding boxes problem as it segments the infected spots with different colors showing the difference. The advantage of segmentation further covers the classification of the infection type, localizing with the help of a bounding box, and essentially segment the infected areas using colored masks. Mask RCNN, is the instance segmentation algorithm that can be used to solve the aforementioned problem, by processing through several layers of convolutional neural networks. Instance Segmentation is difficult because it necessitates both accurate detections of all objects in an image and exact segmentation of each instance. This helps the farmer identify the infection on the leaf without prior knowledge about the type of infection that has affected the crop to take the required remedy to stop the infection. Moreover, this approach eliminates the burden of manual intervention in identifying the infection.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 29
  • 10.3390/rs14030531
Efficient Instance Segmentation Paradigm for Interpreting SAR and Optical Images
  • Jan 23, 2022
  • Remote Sensing
  • Fan Fan + 6 more

Instance segmentation in remote sensing images is challenging due to the object-level discrimination and pixel-level segmentation for the objects. In remote sensing applications, instance segmentation adopts the instance-aware mask, rather than horizontal bounding box and oriented bounding box in object detection, or category-aware mask in semantic segmentation, to interpret the objects with the boundaries. Despite these distinct advantages, versatile instance segmentation methods are still to be discovered for remote sensing images. In this paper, an efficient instance segmentation paradigm (EISP) for interpreting the synthetic aperture radar (SAR) and optical images is proposed. EISP mainly consists of the Swin Transformer to construct the hierarchical features of SAR and optical images, the context information flow (CIF) for interweaving the semantic features from the bounding box branch to mask branch, and the confluent loss function for refining the predicted masks. Experimental conclusions can be drawn on the PSeg-SSDD (Polygon Segmentation—SAR Ship Detection Dataset) and NWPU VHR-10 instance segmentation dataset (optical dataset): (1) Swin-L, CIF, and confluent loss function in EISP acts on the whole instance segmentation utility; (2) EISP* exceeds vanilla mask R-CNN 4.2% AP value on PSeg-SSDD and 11.2% AP on NWPU VHR-10 instance segmentation dataset; (3) The poorly segmented masks, false alarms, missing segmentations, and aliasing masks can be avoided to a great extent for EISP* in segmenting the SAR and optical images; (4) EISP* achieves the highest instance segmentation AP value compared to the state-of-the-art instance segmentation methods.

  • Research Article
  • Cite Count Icon 37
  • 10.1109/tcsvt.2021.3063377
Segmenting Beyond the Bounding Box for Instance Segmentation
  • Mar 5, 2021
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Xiaoliang Zhang + 4 more

Instance segmentation needs to locate all instances in an image correctly and segment each instance precisely. Currently, the most dominant methods for instance segmentation take object detection as a pre-task. However, they rely on the accuracy of object detection incredibly. If the pre-task cannot predict an accurate bounding box, the performance of instance segmentation will degenerate. In this paper, we present a novel method for instance segmentation to solve this problem, which is called <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">S</b> egmenting <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">B</b> eyond the <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">B</b> ounding <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">B</b> ox ( <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">S3B-Net</b> ). Our S3B-Net designs a sub-network to help instance segmentation methods based on object detection to segment the part of an instance beyond the bounding box. Specifically, the sub-network first predicts a two-dimensional pixel embedding for each pixel. Then, the Gaussian function is employed to calculate a pixel’s probability belongs to a corresponding instance according to the two-dimensional pixel embedding. Finally, the output of the sub-network combines with the output of instance segmentation based on object detection to generate a more precise instance mask. Our sub-network can easily extend on the existing instance segmentation method based on object detection to segment instance beyond the bounding box. We do our experiments on dominant instance segmentation datasets, such as the COCO dataset and Cityscapes dataset. The results show that our method can achieve 6.8 points gain compared with the baseline Mask R-CNN with ResNet-50-FPN in Cityscapes datasets, and 1.7 points gain with ResNet-101-FPN-DCN in COCO datasets. Our S3B-Net outperforms the previous state-of-the-art instance segmentation method, which proves our method is competitive. The source code of our method will be made available.

  • Research Article
  • Cite Count Icon 1
  • 10.2478/cait-2025-0022
Unification of Semantic and Instance Segmentation with BoundaryX
  • Sep 1, 2025
  • Cybernetics and Information Technologies
  • Teodor Boyadzhiev + 1 more

Semantic segmentation is a field of image content recognition in which each pixel is classified according to the type of object it belongs to, while instance segmentation distinguishes individual object instances. A novel method, BoundaryX, is proposed to unify both tasks without relying on bounding boxes. Each pixel is classified, and boundaries are drawn around separate instances, enabling easy bounding box calculation without shape constraints or region proposals. Both instanced objects (like people) and non-instanced ones (like the sky) are handled by BoundaryX, without hardcoded exceptions. The quality of the method was evaluated on the COCO dataset for the class “people” by measuring Intersection over Union (IoU) for the semantic segmentation and bounding boxes recall and precision. The method achieved 0.774 IoU for semantic segmentation, 75% recall, and 83% precision for bounding box quality. Segmentation pipelines are simplified through the unified solution and flexible boundary-based representation provided by BoundaryX.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.3390/app14093623
An Instance Segmentation Method for Insulator Defects Based on an Attention Mechanism and Feature Fusion Network
  • Apr 25, 2024
  • Applied Sciences
  • Junpeng Wu + 4 more

Among the existing insulator defect detection methods, the automatic detection of inspection robots based on the instance segmentation algorithm is relatively more efficient, but the problem of the limited accuracy of the segmentation algorithm is still a bottleneck for increasing inspection efficiency. Therefore, we propose a single-stage insulator instance defect segmentation method based on both an attention mechanism and improved feature fusion network. YOLACT is selected as the basic instance segmentation model. Firstly, to improve the segmentation speed, MobileNetV2 embedded with an scSE attention mechanism is introduced as the backbone network. Secondly, a new feature map that combines semantic and positional information is obtained by improving the FPN module and fusing the feature maps of each layer, during which, an attention mechanism is introduced to further improve the quality of the feature map. Thirdly, in view of the problems that affect the insulator segmentation, a Restrained-IoU (RIoU) bounding box loss function which covers the area deviation, center deviation, and shape deviation is designed for object detection. Finally, for the validity evaluation of the proposed method, experiments are performed on the insulator defect data set. It is shown in the results that the improved algorithm achieves a mask accuracy improvement of 5.82% and a detection speed of 37.4 FPS, which better complete the instance segmentation of insulator defect images.

  • Research Article
  • Cite Count Icon 2
  • 10.3390/info16010063
Object Detection Post Processing Accelerator Based on Co-Design of Hardware and Software
  • Jan 17, 2025
  • Information
  • Dengtian Yang + 3 more

Deep learning significantly advances object detection. Post processes, a critical component of this process, select valid bounding boxes to represent the true targets during inference and assign boxes and labels to these objects during training to optimize the loss function. However, post processes constitute a substantial portion of the total processing time for a single image. This inefficiency primarily arises from the extensive Intersection over Union (IoU) calculations required between numerous redundant bounding boxes in post processing algorithms. To reduce these redundant IoU calculations, we introduce a classification prioritization strategy during both training and inference post processes. Additionally, post processes involve sorting operations that contribute to their inefficiency. To minimize unnecessary comparisons in Top-K sorting, we have improved the bitonic sorter by developing a hybrid bitonic algorithm. These improvements have effectively accelerated the post processing. Given the similarities between the training and inference post processes, we unify four typical post processing algorithms and design a hardware accelerator based on this framework. Our accelerator achieves at least 7.55 times the speed in inference post processing compared to that of recent accelerators. When compared to the RTX 2080 Ti system, our proposed accelerator offers at least 21.93 times the speed for the training post process and 19.89 times for the inference post process, thereby significantly enhancing the efficiency of loss function minimization.

  • Preprint Article
  • 10.20944/preprints202412.0438.v1
Object Detection Post-Processing Accelerator Based on Co-Design of Hardware and Software
  • Dec 5, 2024
  • Preprints.org
  • Dengtian Yang + 5 more

Deep learning significantly advances object detection. Post process, a critical component of this process, selects valid bounding boxes to represent true targets during inference and assigns boxes and labels to these objects during training to optimize the loss function. However, post process constitutes a substantial portion of the total processing time for a single image. This inefficiency primarily arises from the extensive Intersection over Union (IoU) calculations required between numerous redundant bounding boxes in post-processing algorithms. To reduce the redundant IoU calculations, we introduce a classification prioritization strategy during both training and inference post processes. Additionally, post process involves sorting operations that contribute to inefficiency. To minimize unnecessary comparisons in Top-K sorting, we have improved the bitonic sorter by developing a hybrid bitonic algorithm. These improvements have effectively accelerated post process. Given the similarities between training and inference post processes, we unify four typical post-processing algorithms and design a hardware accelerator based on this framework. Our accelerator achieves at least 7.55 times the speed in inference post process compared to recent accelerators. When compared to the RTX 2080 Ti system, our proposed accelerator offers at least 21.93 times the speed for training post process and 19.89 times for inference post process, thereby significantly enhancing the efficiency of loss function minimization.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 21
  • 10.3390/rs13142788
CPISNet: Delving into Consistent Proposals of Instance Segmentation Network for High-Resolution Aerial Images
  • Jul 15, 2021
  • Remote Sensing
  • Xiangfeng Zeng + 6 more

Instance segmentation of high-resolution aerial images is challenging when compared to object detection and semantic segmentation in remote sensing applications. It adopts boundary-aware mask predictions, instead of traditional bounding boxes, to locate the objects-of-interest in pixel-wise. Meanwhile, instance segmentation can distinguish the densely distributed objects within a certain category by a different color, which is unavailable in semantic segmentation. Despite the distinct advantages, there are rare methods which are dedicated to the high-quality instance segmentation for high-resolution aerial images. In this paper, a novel instance segmentation method, termed consistent proposals of instance segmentation network (CPISNet), for high-resolution aerial images is proposed. Following top-down instance segmentation formula, it adopts the adaptive feature extraction network (AFEN) to extract the multi-level bottom-up augmented feature maps in design space level. Then, elaborated RoI extractor (ERoIE) is designed to extract the mask RoIs via the refined bounding boxes from proposal consistent cascaded (PCC) architecture and multi-level features from AFEN. Finally, the convolution block with shortcut connection is responsible for generating the binary mask for instance segmentation. Experimental conclusions can be drawn on the iSAID and NWPU VHR-10 instance segmentation dataset: (1) Each individual module in CPISNet acts on the whole instance segmentation utility; (2) CPISNet* exceeds vanilla Mask R-CNN 3.4%/3.8% AP on iSAID validation/test set and 9.2% AP on NWPU VHR-10 instance segmentation dataset; (3) The aliasing masks, missing segmentations, false alarms, and poorly segmented masks can be avoided to some extent for CPISNet; (4) CPISNet receives high precision of instance segmentation for aerial images and interprets the objects with fitting boundary.

  • Research Article
  • Cite Count Icon 5
  • 10.1609/aaai.v36i1.19880
Joint Human Pose Estimation and Instance Segmentation with PosePlusSeg
  • Jun 28, 2022
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Niaz Ahmad + 3 more

Despite the advances in multi-person pose estimation, state-of-the-art techniques only deliver the human pose structure.Yet, they do not leverage the keypoints of human pose to deliver whole-body shape information for human instance segmentation. This paper presents PosePlusSeg, a joint model designed for both human pose estimation and instance segmentation. For pose estimation, PosePlusSeg first takes a bottom-up approach to detect the soft and hard keypoints of individuals by producing a strong keypoint heat map, then improves the keypoint detection confidence score by producing a body heat map. For instance segmentation, PosePlusSeg generates a mask offset where keypoint is defined as a centroid for the pixels in the embedding space, enabling instance-level segmentation for the human class. Finally, we propose a new pose and instance segmentation algorithm that enables PosePlusSeg to determine the joint structure of the human pose and instance segmentation. Experiments using the COCO challenging dataset demonstrate that PosePlusSeg copes better with challenging scenarios, like occlusions, en-tangled limbs, and overlapped people. PosePlusSeg outperforms state-of-the-art detection-based approaches achieving a 0.728 mAP for human pose estimation and a 0.445 mAP for instance segmentation. Code has been made available at: https://github.com/RaiseLab/PosePlusSeg.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.iswa.2024.200454
Pretraining instance segmentation models with bounding box annotations
  • Oct 28, 2024
  • Intelligent Systems with Applications
  • Cathaoir Agnew + 5 more

Pretraining instance segmentation models with bounding box annotations

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant