Open Vocabulary Research Articles

Open-vocabulary learning can identify categories marked during training (seen categories) and generalize to categories not annotated in the training set (unseen categories). It could theoretically extend segmentation systems to more universal applications. However, current open-vocabulary segmentation frameworks are primarily suited for specific tasks or require retraining according to the task, and they significantly underperform in inferring seen categories compared to fully supervised frameworks. Therefore, we introduce a universal open-vocabulary segmentation framework based on the latent diffusion process (SegLD), which requires only a single training session on a panoptic dataset to achieve inference across all open-vocabulary segmentation tasks, and reaches SOTA segmentation performance for both seen and unseen categories in every task. Specifically, SegLD comprises two stages: in the first stage, we deploy two parallel latent diffusion processes to deeply fuse the text (image caption or category labels) and image information, further aggregating the multi-scale features output from both latent diffusion processes on a scale basis. In the second stage, we introduce text queries, text list queries, and task queries, facilitating the learning of inter-category and inter-task differences through the computation of contrastive losses between them. Text queries are then further fed into a Transformer Decoder to obtain category-agnostic segmentation masks. Then we establish classification loss functions for the type of text input during training, whether image captions or category labels, to help assign a category label from the open vocabulary to each predicted binary mask. Experimental results show that, with just a single training session, SegLD significantly outperforms other contemporary SOTA fully supervised segmentation frameworks and open-vocabulary segmentation frameworks across almost all evaluation metrics for both known and unknown categories on the ADE20K, Cityscapes, and COCO datasets. This highlights SegLD’s capability as a universal segmentation framework, with the potential to replace other segmentation frameworks and adapt to various segmentation domains. The project link for SegLD is https://zht-segld.github.io/.

Deep neural networks (DNNs) have achieved unprecedented success across many scientific and engineering fields in the last decades. Despite its empirical success, unfortunately, recent studies have shown that there are various failure modes and blindspots in DNN models which may result in unexpected serious failures and potential harms, e.g. the existence of adversarial examples and small perturbations. This is not acceptable especially for safety critical and high stakes applications in the real-world, including healthcare, self-driving cars, aircraft control systems, hiring and malware detection protocols. Moreover, it has been challenging to understand why and when DNNs will fail due to their complicated structures and black-box behaviors. Lacking interpretability is one critical issue that may seriously hinder the deployment of DNNs in high-stake applications, which need interpretability to trust the prediction, to understand potential failures, and to be able to mitigate harms and eliminate biases in the model. To make DNNs trustworthy and reliable for deployment, it is necessary and urgent to develop methods and tools that can (i) quantify and improve their robustness against adversarial and natural perturbations, and (ii) understand their underlying behaviors and further correct errors to prevent injuries and damages. These are the important first steps to enable Trustworthy AI and Trustworthy Machine Learning. In this talk, I will survey a series of research efforts in my lab contributed to tackling the grand challenges in (i) and (ii). In the first part of my talk, I will overview our research effort in Robust Machine Learning since 2017, where we have proposed the first attack-agnostic robustness evaluation metric, the first efficient robustness certification algorithms for various types of perturbations, and efficient robust learning algorithms across supervised learning to deep reinforcement learning. In the second part of my talk, I will survey a series of exciting results in my lab on accelerating interpretable machine learning and explainable AI. Specifically, I will show how we could bring interpretability into deep learning by leveraging recent advances in multi-modal models. I'll present recent works in our group on automatically dissecting neural networks with open vocabulary concepts, designing interpretable neural networks without concept labels, and briefly overview our recent efforts on demystifying black-box DNN training process, automated neuron explanations for Large Language Models and the first robustness evaluation of a family of neuron-level interpretation techniques.

Open Vocabulary Research Articles

Related Topics

Articles published on Open Vocabulary

Physically-guided open vocabulary segmentation with weighted patched alignment loss

TAMC: Textual Alignment and Masked Consistency for Open-Vocabulary 3D Scene Understanding.

Towards Open Vocabulary Learning: A Survey.

SegLD: Achieving universal, zero-shot and open-vocabulary segmentation through multimodal fusion via latent diffusion processes

Exploration of an Open Vocabulary Model on Semantic Segmentation for Street Scene Imagery

CLIM: Contrastive Language-Image Mosaic for Region Representation

Towards Trustworthy Deep Learning

Exploring the Impact of Vocabulary Techniques on Code Completion: A Comparative Approach

OHDSI Standardized Vocabularies-a large-scale centralized reference ontology for international data harmonization.

Deep Representation Learning for Open Vocabulary Electroencephalography-to-Text Decoding.

Emotional Video Captioning With Vision-Based Emotion Interpretation Network.

Learning to Answer Visual Questions from Web Videos.

Improved Small-Footprint ASR-Based Solution for Open Vocabulary Keyword Spotting

Computational Analysis of Printed Arabic Text Database for Natural Language Processing

SMART-FCD: IOT DATA INTEROPERABILITY USING SENSOR BASED FUZZY LINKED RULES FOR CROSS DOMAIN APPLICATIONS

OREV: An item response theory-based open receptive vocabulary task for 3- to 8-year-old children

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer

DocEdit: Language-Guided Document Editing

Predicting U.S. county opioid poisoning mortality from multi-modal social media and psychological self-report data

The reuse of DCMI metadata terms in linked open vocabulary

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Open Vocabulary Research Articles

Related Topics

Articles published on Open Vocabulary

Physically-guided open vocabulary segmentation with weighted patched alignment loss

TAMC: Textual Alignment and Masked Consistency for Open-Vocabulary 3D Scene Understanding.

Towards Open Vocabulary Learning: A Survey.

SegLD: Achieving universal, zero-shot and open-vocabulary segmentation through multimodal fusion via latent diffusion processes

Exploration of an Open Vocabulary Model on Semantic Segmentation for Street Scene Imagery

CLIM: Contrastive Language-Image Mosaic for Region Representation

Towards Trustworthy Deep Learning

Exploring the Impact of Vocabulary Techniques on Code Completion: A Comparative Approach

OHDSI Standardized Vocabularies-a large-scale centralized reference ontology for international data harmonization.

Deep Representation Learning for Open Vocabulary Electroencephalography-to-Text Decoding.

Emotional Video Captioning With Vision-Based Emotion Interpretation Network.

Learning to Answer Visual Questions from Web Videos.

Improved Small-Footprint ASR-Based Solution for Open Vocabulary Keyword Spotting

Computational Analysis of Printed Arabic Text Database for Natural Language Processing

SMART-FCD: IOT DATA INTEROPERABILITY USING SENSOR BASED FUZZY LINKED RULES FOR CROSS DOMAIN APPLICATIONS

OREV: An item response theory-based open receptive vocabulary task for 3- to 8-year-old children

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer

DocEdit: Language-Guided Document Editing

Predicting U.S. county opioid poisoning mortality from multi-modal social media and psychological self-report data

The reuse of DCMI metadata terms in linked open vocabulary