InstantDL: an easy-to-use deep learning pipeline for image segmentation and classification
BackgroundDeep learning contributes to uncovering molecular and cellular processes with highly performant algorithms. Convolutional neural networks have become the state-of-the-art tool to provide accurate and fast image data processing. However, published algorithms mostly solve only one specific problem and they typically require a considerable coding effort and machine learning background for their application.ResultsWe have thus developed InstantDL, a deep learning pipeline for four common image processing tasks: semantic segmentation, instance segmentation, pixel-wise regression and classification. InstantDL enables researchers with a basic computational background to apply debugged and benchmarked state-of-the-art deep learning algorithms to their own data with minimal effort. To make the pipeline robust, we have automated and standardized workflows and extensively tested it in different scenarios. Moreover, it allows assessing the uncertainty of predictions. We have benchmarked InstantDL on seven publicly available datasets achieving competitive performance without any parameter tuning. For customization of the pipeline to specific tasks, all code is easily accessible and well documented.ConclusionsWith InstantDL, we hope to empower biomedical researchers to conduct reproducible image processing with a convenient and easy-to-use pipeline.
- Book Chapter
79
- 10.1007/978-3-030-33676-9_4
- Jan 1, 2019
Recent deep learning models achieve impressive results on 3D scene analysis tasks by operating directly on unstructured point clouds. A lot of progress was made in the field of object classification and semantic segmentation. However, the task of instance segmentation is less explored. In this work, we present 3D-BEVIS, a deep learning framework for 3D semantic instance segmentation on point clouds. Following the idea of previous proposal-free instance segmentation approaches, our model learns a feature embedding and groups the obtained feature space into semantic instances. Current point-based methods scale linearly with the number of points by processing local sub-parts of a scene individually. However, to perform instance segmentation by clustering, globally consistent features are required. Therefore, we propose to combine local point geometry with global context information from an intermediate bird's-eye view representation.
- Conference Article
13
- 10.1109/wacv.2019.00194
- Jan 1, 2019
Focusing on only semantic instances that only salient in a scene gains more benefits for robot navigation and self-driving cars than looking at all objects in the whole scene. This paper pushes the envelope on salient regions in a video to decompose them into semantically meaningful components, namely, semantic salient instances. We provide the baseline for the new task of video semantic salient instance segmentation (VSSIS), that is, Semantic Instance - Salient Object (SISO) framework. The SISO framework is simple yet efficient, leveraging advantages of two different segmentation tasks, i.e. semantic instance segmentation and salient object segmentation to eventually fuse them for the final result. In SISO, we introduce a sequential fusion by looking at overlapping pixels between semantic instances and salient regions to have non-overlapping instances one by one. We also introduce a recurrent instance propagation to refine the shapes and semantic meanings of instances, and an identity tracking to maintain both the identity and the semantic meaning of instances over the entire video. Experimental results demonstrated the effectiveness of our SISO baseline, which can handle occlusions in videos. In addition, to tackle the task of VSSIS, we augment the DAVIS-2017 benchmark dataset by assigning semantic ground-truth for salient instance labels, obtaining SEmantic Salient Instance Video (SESIV) dataset. Our SESIV dataset consists of 84 high-quality video sequences with pixel-wisely per-frame ground-truth labels.
- Research Article
- 10.20532/cit.2023.1005740
- Apr 4, 2024
- Journal of Computing and Information Technology
In this study, we propose a segmentation model based on convolutional neural networks (CNNs) to address image segmentation challenges in computer vision. Prior to designing the model, the activation function and other modules of the convolutional neural network were optimized to meet specific requirements. The segmentation task was transformed into binary classification problem to simplify network calculations and improve efficiency. Additionally, the model utilized a mask map obtained from the semantic segmentation model to aid in instance segmentation. Class activation technology was introduced to extract feature mapping maps. The corresponding thermal maps were obtained to achieve target instance segmentation. To further validate the effectiveness of the segmentation model, simulation experiments were conducted on semantic segmentation and instance segmentation respectively. The results show that the accuracy of the basic semantic segmentation model reached 87.58%, while the average accuracy of the entire class of the optimized instance segmentation model reached 97.9%. Therefore, the research and design of image segmentation models demonstrate high accuracy and good robustness.
- Research Article
- 10.11834/jig.210337
- Jan 1, 2023
- Journal of Image and Graphics
计算机视觉的任务目标是建立接近人类视觉系统的计算模型。随着深度神经网络(deep neural network,DNN)的发展,对计算机视觉中高层语义的分析与理解成为研究重点。计算机视觉的高层语义通常为人类可理解、可表述的用于表达图像、视频等媒体信号内容的描述子(descriptor),典型的高层语义分析任务包含图像分类、目标检测、实例分割、语义分割与视频场景识别、目标跟踪等。基于深度神经网络的算法使计算机视觉任务获得逐步提升的性能,但是网络模型的体量增大与计算效率的降低随之而来。模型蒸馏是一种基于迁移学习进行模型压缩的方案。此类方案通常利用一个预训练模型作为教师,提取其有效的表示,如模型输出、隐藏层特征或特征间相似度等,并将上述表示作为另一个规模较小、推断速度较快的学生模型的额外监督信号,对该学生模型进行训练,以达到提升小模型性能从而取代大模型的目的。模型蒸馏对模型性能与计算复杂度有着良好权衡,因此愈来愈多地用于基于深度学习的高层语义分析中。自2014年模型蒸馏概念提出以来,研究人员开发了大量应用于高层语义分析的模型蒸馏方法,在图像分类、目标检测与语义分割任务中的应用最为广泛。本文对上述典型任务中具有代表性的模型蒸馏方案进行调研和汇总,依照不同的视觉任务进行介绍。首先,从最成熟、应用最广泛的分类任务模型蒸馏方法开始,介绍其不同的设计思路与应用场景,展示部分实验性能的对比,指出在分类任务上与在检测、分割任务上应用模型蒸馏的条件差异性。接着,对几种经特殊设计而应用于目标检测、语义分割的典型模型蒸馏方法进行介绍,结合模型结构对设计目的与思路进行说明,提供部分实验结果的对比与分析。最后,对当前高层语义分析中模型蒸馏方法的现状进行了总结分析,并指出存在的困难及不足,设想未来可能的探索思路与发展方向。
- Conference Article
7
- 10.1109/icra48506.2021.9560798
- May 30, 2021
Panoptic Segmentation aims to provide an understanding of background (stuff) and instances of objects (things) at a pixel level. It combines the separate tasks of semantic segmentation (pixel level classification) and instance segmentation to build a single unified scene understanding task. Typically, panoptic segmentation is derived by combining semantic and instance segmentation tasks that are learned separately or jointly (multi-task networks). In general, instance segmentation networks are built by adding a foreground mask estimation layer on top of object detectors or using instance clustering methods that assign a pixel to an instance center. In this work, we present a fully convolution neural network that learns instance segmentation from semantic segmentation and instance contours (boundaries of things). Instance contours along with semantic segmentation yield a boundary aware semantic segmentation of things. Connected component labeling on these results produces instance segmentation. We merge semantic and instance segmentation results to output panoptic segmentation. We evaluate our proposed method on the CityScapes dataset to demonstrate qualitative and quantitative performances along with several ablation studies. Our overview video can be accessed from https://youtu.be/wBtcxRhG3e0.
- Research Article
- 10.1155/acis/1918054
- Jan 1, 2025
- Applied Computational Intelligence and Soft Computing
Deep learning–based segmentation models have gained significant focus in various computer vision applications, including remote sensing and medical imaging. There exist deep learning architectures for semantic and instance segmentation separately, with limitations prevailing such as imprecise boundary delineation, poor spatial consistency, improper fine‐grained object separation, and inaccurate instance segmentation, particularly while handling intricate object structures in remote sensing images (RSI). To mitigate the aforementioned issues, in the present work, we propose a unified deep framework that integrates both semantic and instance segmentation within a single architecture tailored for high‐resolution RSI. Our framework combines an improved attention residual U‐Net (IARU‐Net) for pixel‐level semantic segmentation and a dynamic Mask R‐CNN for instance‐level segmentation. To further refine spatial coherence and boundary delineation, we incorporate the postprocessing technique such as conditional random fields (CRFs) on the output segmentation map of the enhanced U‐Net to improve spatial consistency and edge sharpness. This refined semantic mask serves as input to the dynamic Mask R‐CNN model for instance segmentation, where the graph‐based refinement module (GRM) is employed to improve boundary accuracy by leveraging graph‐based smoothing techniques. Our approach ensures improved object delineation, increases the segmentation accuracy, and decreases false positives compared to conventional deep learning architectures. Evaluation outcomes on standard datasets illustrate that the proposed approach attains superior performance, highlighting its effectiveness in both semantic and instance segmentation tasks. The results validate the effectiveness of jointly modeling semantic and instance‐level information, providing a more comprehensive understanding of complex remote sensing scenes.
- Conference Article
61
- 10.1117/12.2304711
- Apr 30, 2018
Object recognition and semantic segmentation have been the two most common problems of traditional scene understanding in the computer vision domain. Major breakthroughs were reported in the last few years because of the increased utilization of deep learning, which offer a convincing alternative by learning the problem specific features on their own. In this paper, a summary of the frequently used framework – convolutional neural networks (CNN) is discussed. Accordingly a categorization scheme has been proposed to analyze the deep networks developed for image segmentation. Under this scheme, thirteen methods from the literature have been reviewed which are classified on the basis on how they perform segmentation operation i.e. semantic segmentation, instance segmentation and hybrid approaches. These method were reviewed from different aspects like their category, the novelty in the architecture of the method, and their special features in contrast with the traditional approaches. Latest review and analysis of these segmentation approaches, which provided outstanding results for image segmentation compared to the ordinary system, reveals that deep learning is increasingly becoming an important part of image segmentation and improvement in deep learning algorithms, which could resolve computer vision problems.
- Research Article
6
- 10.3390/math10142365
- Jul 6, 2022
- Mathematics
Environment perception and understanding represent critical aspects in most computer vision systems and/or applications. State-of-the-art techniques to solve this vision task (e.g., semantic instance segmentation) require either dedicated hardware resources to run or a longer execution time. Generally, the main efforts were to improve the accuracy of these methods rather than make them faster. This paper presents a novel solution to speed up the semantic instance segmentation task. The solution combines two state-of-the-art methods from semantic instance segmentation and optical flow fields. To reduce the inference time, the proposed framework (i) runs the inference on every 5th frame, and (ii) for the remaining four frames, it uses the motion map computed by optical flow to warp the instance segmentation output. Using this strategy, the execution time is strongly reduced while preserving the accuracy at state-of-the-art levels. We evaluate our solution on two datasets using available benchmarks. Then, we conclude on the results obtained, highlighting the accuracy of the solution and the real-time operation capability.
- Research Article
6
- 10.3390/rs13214357
- Oct 29, 2021
- Remote Sensing
As-is building modeling plays an important role in energy audits and retrofits. However, in order to understand the source(s) of energy loss, researchers must know the semantic information of the buildings and outdoor scenes. Thermal information can potentially be used to distinguish objects that have similar surface colors but are composed of different materials. To utilize both the red–green–blue (RGB) color model and thermal information for the semantic segmentation of buildings and outdoor scenes, we deployed and adapted various pioneering deep convolutional neural network (DCNN) tools that combine RGB information with thermal information to improve the semantic and instance segmentation processes. When both types of information are available, the resulting DCNN models allow us to achieve better segmentation performance. By deploying three case studies, we experimented with our proposed DCNN framework, deploying datasets of building components and outdoor scenes, and testing the models to determine whether the segmentation performance had improved or not. In our observation, the fusion of RGB and thermal information can help the segmentation task in specific cases, but it might also make the neural networks hard to train or deteriorate their prediction performance in some cases. Additionally, different algorithms perform differently in semantic and instance segmentation.
- Research Article
1
- 10.5194/isprs-archives-xlviii-2-2024-435-2024
- Jun 11, 2024
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Semantic instance segmentation from scenes, serving as a crucial role for 3D modelling and scene understanding. Conducting semantic segmentation before grouping instances is adopted by the existing state-of-the-art methods. However, without additional refinement, semantic errors will fully propagate into the grouping stage, resulting in low overlap with the ground truth instance. Furthermore, the proposed methods focused on indoor level scenes, which are limited when directly applied to large-scale outdoor Airborne Laser Scanning (ALS) point clouds. Numerous instances, significant object density and scale variations make ALS point clouds distinct from indoor data. In order to address the problems, we proposed a geometric characterization-aware semantic instance segmentation network, which utilized both semantic and objectness score to select potential points for grouping. And in point cloud feature learning stage, hand-craft geometry features are taken as input for geometric characterization awareness. Moreover, to address errors propagated from previous modules after grouping, we have additionally designed a per-instance refinement module. To assess semantic instance segmentation, we conducted experiments on an open-source dataset. Additionally, we performed semantic segmentation experiments to evaluate the performance of our proposed point cloud feature learning method.
- Research Article
- 10.1049/el.2020.1162
- Aug 28, 2020
- Electronics Letters
The general objective of this research is non-destructive assessment of the grain size in the metals. The authors suggest a new way of applying neural network technology when the neural network is used for analysis of the ultrasonic structural noise. It was assumed that the signals of ultrasonic structural noise are measured at several frequencies. In order to address structural noise issues, a convolutional neural network is designed to process ultrasonic sensor data, to learn structural noise features and to achieve direct grain size estimation simultaneously. To ensure minimum data gathering of metal samples, the design focuses on neural network with concept of semantic instance segmentation, for data extrapolation. Experimental results show that proposed methods as semantic instance segmentation with combined convolutional and fully connected dense neural networks with classifiers outperform the other single neural networks with original samples with high signal to noise ratio (SN) data.
- Research Article
30
- 10.1016/j.rse.2021.112757
- Oct 22, 2021
- Remote Sensing of Environment
Utilizing unsupervised learning, multi-view imaging, and CNN-based attention facilitates cost-effective wetland mapping
- Research Article
13
- 10.3390/app12157811
- Aug 3, 2022
- Applied Sciences
The modern urban environment is becoming more and more complex. In helping us identify surrounding objects, vehicle vision sensors rely more on the semantic segmentation ability of deep learning networks. The performance of a semantic segmentation network is essential. This factor will directly affect the comprehensive level of driving assistance technology in road environment perception. However, the existing semantic segmentation network has a redundant structure, many parameters, and low operational efficiency. Therefore, to reduce the complexity of the network and reduce the number of parameters to improve the network efficiency, based on the deep learning (DL) theory, a method for efficient image semantic segmentation using Deep Convolutional Neural Network (DCNN) is deeply studied. First, the theoretical basis of the convolutional neural network (CNN) is briefly introduced, and the real-time semantic segmentation technology of urban scenes based on DCNN is recommended in detail. Second, the atrous convolution algorithm and the multi-scale parallel atrous spatial pyramid model are introduced. On the basis of this, an Efficient Symmetric Network (ESNet) of real-time semantic segmentation model for autonomous driving scenarios is proposed. The experimental results show that: (1) On the Cityscapes dataset, the ESNet structure achieves 70.7% segmentation accuracy for the 19 semantic categories set, and 87.4% for the seven large grouping categories. Compared with other algorithms, the accuracy has increased to varying degrees. (2) On the CamVid dataset, compared with segmentation networks of multiple lightweight real-time images, the parameters of the ESNet model are around 1.2 m, the highest FPS value is around 90 Hz, and the highest mIOU value is around 70%. In seven semantic categories, the segmentation accuracy of the ESNet model is the highest at around 98%. From this, we found that the ESNet significantly improves segmentation accuracy while maintaining faster forward inference speed. Overall, the research not only provides technical support for the development of real-time semantic understanding and segmentation of DCNN algorithms but also contributes to the development of artificial intelligence technology.
- Conference Article
246
- 10.1109/cvpr42600.2020.00905
- Jun 1, 2020
We present 3D-MPA, a method for instance segmentation on 3D point clouds. Given an input point cloud, we propose an object-centric approach where each point votes for its object center. We sample object proposals from the predicted object centers. Then, we learn proposal features from grouped point features that voted for the same object center. A graph convolutional network introduces inter-proposal relations, providing higher-level feature learning in addition to the lower-level point features. Each proposal comprises a semantic label, a set of associated points over which we define a foreground-background mask, an objectness score and aggregation features. Previous works usually perform non-maximum-suppression (NMS) over proposals to obtain the final object detections or semantic instances. However, NMS can discard potentially correct predictions. Instead, our approach keeps all proposals and groups them together based on the learned aggregation features. We show that grouping proposals improves over NMS and outperforms previous state-of-the-art methods on the tasks of 3D object detection and semantic instance segmentation on the ScanNetV2 benchmark and the S3DIS dataset.
- Research Article
26
- 10.1002/ohn.317
- Mar 8, 2023
- Otolaryngology–Head and Neck Surgery
A Self-Configuring Deep Learning Network for Segmentation of Temporal Bone Anatomy in Cone-Beam CT Imaging.