Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Multi-modal transformer for video retrieval using improved sentence embeddings

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

With the explosive growth of the number of online videos, video retrieval becomes increasingly difficult. Multi-modal visual and language understanding based video-text retrieval is one of the mainstream framework to solve this problem. Among them, MMT (Multi-modal Transformer) is a novel and mainstream model. On the language side, BERT (Bidirectional Encoder Representation for Transformers) is used to encode text, where the pretrained BERT will be fine tuned during training. However, there exists a mismatch in this stage. The pre-training tasks of BERT is based on NSP (Next Sentence Prediction) and MLM(masked language model) which have weak correlation with video retrieval. For text encoder will encode text into semantic embeddings. On the visual side, Transformer is used to aggregate multimodal experts of videos. We find that the output of visual transformer is not fully utilized. In this paper, a sentence- BERT model is introduced to substitute BERT model in MMT to improve sentence embeddings efficiency. In addition, a max-pooling layer is adopted after Transformer to improve the utilization efficiency of the output of the model. Experiment results show that the proposed model outperforms MMT.

Similar Papers
  • Conference Article
  • 10.54941/ahfe1001177
Effective deep learning through bidirectional reading on masked language model
  • Jan 1, 2021
  • AHFE international
  • Hiroyuki Nishimoto

Google BERT is a neural network that is good at natural language processing. It has two major strategies. One is “Masked Language Model” to clear the word-level relationships, and the other is “Next Sentence Prediction” to clear sentence-level relationships. In the Masked Language Model, with the task of masking some words in sentences, BERT learns to predict the original word from context. Some questions come to mind. Why BERT achieves effective learning by reading in two ways from fore and back? What is the difference between the bidirectional reading? In the Masked Language Model with the task of masking some words, the middle sentence is “I ate [mask] every morning”. First, predict the masked word by forward reading. The previous sentence is “Considering my health, I decided to change the breakfast menu”. What is asked in general? We usually think about which is more realistic and reach “an apple”. The answer is “feasibility”.Next, predict the masked word by backward reading, focusing on the middle sentence and the post sentence “A month later, I lost 3 kg and became healthy.” What is asked in general? In such a situation, we are looking for success factors. We usually think about which is more relevant and reach “an apple”. The answer is causality. Therefore, BERT is learning to predict feasibility by forward reading and causality by backward reading.Besides, the bidirectional reading technique can be applied to scenario planning using back-casting from the future. Scenario planning is making assumptions on what the future is going to be. A scenario can be described in two ways, one is fore-casting and the other is back-casting. Fore-casting means viewing from the present to the future. In general, back-casting means viewing from the present to the past. But in this paper, it means viewing from the future to the present. Just as there are two different predictions for bidirectional reading, there is a big difference between fore-casting into the future and back-casting from the future. How do you feel about the first scenario using fore-casting? You tend to focus on feasibility. Therefore, a long debate starts about the feasibility, such as “Is it possible?”, “Is it difficult?”, and “How to achieve?”.On the other hand, how do you feel about the second scenario using back-casting from the future to the present? This scenario has to be written in past mode because of back-casting. If it is written in past mode, you feel it has done and someone has resolved all the problems by that time. The surrounding words in past mode change your feeling from prediction to event context. You tend to focus on causal factors of success. You can escape from the long debate.Scenario planning using back-casting from the future to the present makes a good proposition, except for long discussions. Besides, in terms of the mystery of deep learning, each answer lies in human thinking mechanism because AI is created by imitating the human brain.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 64
  • 10.18653/v1/2020.acl-main.247
Span Selection Pre-training for Question Answering
  • Jan 1, 2020
  • Michael Glass + 7 more

BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pretrained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension to better align the pre-training from memorization to understanding. Span Selection PreTraining (SSPT) poses cloze-like training instances, but rather than draw the answer from the model’s parameters, it is selected from a relevant passage. We find significant and consistent improvements over both BERT-BASE and BERT-LARGE on multiple Machine Reading Comprehension (MRC) datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We also show significant impact in HotpotQA, improving answer prediction F1 by 4 points and supporting fact prediction F1 by 1 point and outperforming the previous best system. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount.

  • Video Transcripts
  • 10.48448/2afs-nt28
Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
  • Oct 15, 2021
  • Underline Science Inc.
  • Nikolaos Aletras + 3 more

Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction). However, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQuAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41% of the BERT-BASE's parameters, BERT-MEDIUM results in only a 1% drop in GLUE scores with our best objective.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 19
  • 10.18653/v1/2021.emnlp-main.249
Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
  • Jan 1, 2021
  • Atsuki Yamaguchi + 3 more

Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction). However, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQUAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41% of the BERT-BASE’s parameters, BERT-MEDIUM results in only a 1% drop in GLUE scores with our best objective.

  • Book Chapter
  • Cite Count Icon 366
  • 10.1007/978-3-030-22747-0_7
Conditional BERT Contextual Augmentation
  • Jan 1, 2019
  • Xing Wu + 4 more

We propose a novel data augmentation method for labeled sentences called conditional BERT contextual augmentation. Data augmentation methods are often applied to prevent overfitting and improve generalization of deep neural network models. Recently proposed contextual augmentation augments labeled sentences by randomly replacing words with more varied substitutions predicted by language model. BERT demonstrates that a deep bidirectional language model is more powerful than either an unidirectional language model or the shallow concatenation of a forward and backward model. We retrofit BERT to conditional BERT by introducing a new conditional masked language model\footnote{The term "conditional masked language model" appeared once in original BERT paper, which indicates context-conditional, is equivalent to term "masked language model". In our paper, "conditional masked language model" indicates we apply extra label-conditional constraint to the "masked language model".} task. The well trained conditional BERT can be applied to enhance contextual augmentation. Experiments on six various different text classification tasks show that our method can be easily applied to both convolutional or recurrent neural networks classifier to obtain obvious improvement.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/ichi52183.2021.00093
Contrastive Representations Pre-Training for Enhanced Discharge Summary BERT
  • Aug 1, 2021
  • Daeyeon Won + 3 more

Recently BERT has shown tremendous improvement in performance for various NLP tasks. BERT has been applied to many domains including biomedical field. Especially clinical domain, the semantic relationship between sentences is very important to understand patient’s medical record and health history in physical examination. However, in current Clinical BERT model, the pre-training method is difficult to capture sentence level semantics. To address this problem, we propose a contrastive representations pre-training (CRPT), which can enhance contextual meanings between sentences by replacing cross-entropy loss to contrastive loss in next sentence prediction (NSP) task. Also we tried to improve the performance by changing random masking technique to whole word masking (WWM) for masked language model (MLM). Especially, we focus on enhancing language representations of BERT model by pretraining with discharge summaries to optimize clinical studies. We demonstrate that our CRPT strategy yields performance improvements on clinical NLP task in BLUE (Biomedical Language Understanding Evaluation) Benchmark dataset.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 33
  • 10.18653/v1/2020.acl-main.24
Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order
  • Jan 1, 2020
  • Yi Liao + 2 more

Masked language model and autoregressive language model are two types of language models. While pretrained masked language models such as BERT overwhelm the line of natural language understanding (NLU) tasks, autoregressive language models such as GPT are especially capable in natural language generation (NLG). In this paper, we propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM). We implement a specific PMLM with a uniform prior distribution on the masking ratio named u-PMLM. We prove that u-PMLM is equivalent to an autoregressive permutated language model. One main advantage of the model is that it supports text generation in arbitrary order with surprisingly good quality, which could potentially enable new applications over traditional unidirectional generation. Besides, the pretrained u-PMLM also outperforms BERT on a set of downstream NLU tasks.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icc45855.2022.9838770
A GAN-Bert Based Fault Diagnosis Model for CBTC Data Communication Systems Using Edge-to-edge Collaboration Training
  • May 16, 2022
  • Qingheng Zhuang + 2 more

Communication-Based Train Control (CBTC) systems use wireless communication to confirm safe train operation for train-ground transmission. CBTC consists of four subsystems, and the Data Communication System (DCS) is one of the most important parts, which plays a vital role in train-ground transmission. Many devices are exposed to the environment in DCS, trains operation in bad weather easily leads to a series of hardware failures. When DCS occurs one mistake, it will threaten the efficiency of CBTC and even the safety of passengers. Since DCS occurs failures inevitably, it is necessary to analyze and identify the mistakes more quickly and accurately. This paper proposes a GAN-Bert based edge-to-edge model to identify DCS faults. Bidirectional Encoder Representations from Transformers (BERT) extracts text features from fault diagnosis with Masked Language Model(MLM) and Next Sentence Prediction(NSP). Meanwhile, Generative Adversarial Network (GAN) is used to perform DCS fault diagnosis. At the same time, edge-to-edge collaboration contributes to training a better model. We use raw DCS logs and compare prediction results between the GAN-Bert and Bert-only models. The simulation results represent that the GAN-Bert based model presents higher accuracy.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.54254/2755-2721/92/20241711
Applications of BERT in sentimental analysis
  • Oct 9, 2024
  • Applied and Computational Engineering
  • Zihan Su

This research study emphasizes sentiment analysis and examines Natural Language Processing (NLP) by Bidirectional Encoder Representations from Transformers (BERT). BERT's bidirectional Transformer architecture pre-trained utilizes Next Sentence Prediction (NSP) and Masked Language Modeling (MLM) and has achieved a lot in terms of AI transformation. This paper provides a description of the BERT design, pre-training methods, and fine-tuning for sentiment analysis tasks. The study goes ahead and compares BERT's performance with other deep learning models, machine learning algorithms, and traditional rule-based techniques, highlighting the latter's limited ability to handle linguistic nuances and context. Additionally, studies proving the consistency and accuracy of BERT's sentiment analysis are examined, along with the challenges of handling irony, sarcasm, and domain-specific data. Ethical and privacy concerns that sentiment analysis inherently raises and makes recommendations for further research are also examined in the study, which also shows how integrating sentiment analysis with other domains can lead to multidisciplinary breakthroughs that can offer more comprehensive insights and applications.

  • Conference Article
  • Cite Count Icon 1
  • 10.1145/3672758.3672854
Unveiling the Evolution and Impact of Pretrained Language Models in Natural Language Processing
  • Jan 26, 2024
  • Haoyu Bian

This comprehensive inquiry conducts an in-depth investigation into the development and influence of pretrained language models within the domain of natural language processing (NLP). It commences by elucidating the fundamental machine learning methodologies that underlie these models, encompassing Long Short-Term Memory Networks (LSTMs), Attention Mechanisms (AM), and the groundbreaking Transformer model, which serves as the fundamental framework for pretrained language models. The exploration proceeds through the historical evolution of pretrained language models, starting with early models such as Word2Vec and GloVe, which played pivotal roles in establishing efficient word representations. Subsequently, it delves into the second-generation models, typified by ELMo, which introduced advanced context comprehension and addressed inherent challenges such as polysemy. A significant milestone is marked by the advent of BERT (Bidirectional Encoder Representations from Transformers), which revolutionized the field with its bidirectional training approach, incorporating techniques like Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). The investigation reaches its zenith with an exhaustive analysis of the GPT (Generative Pretrained Transformer) model and its subsequent iterations, including GPT-2 and GPT-3. These models have demonstrated exceptional capabilities in natural language generation and comprehension, achieved through self-supervised pretraining and fine-tuning procedures. Despite their remarkable achievements, pretrained language models confront several challenges, including the generation of potentially harmful outputs. Consequently, innovative solutions such as InstructGPT have been proposed. Looking ahead, pretrained language models are poised to maintain their influential role in reshaping the landscape of NLP, exerting a profound impact across various applications, and serving as an indispensable foundational pillar in contemporary NLP research and development.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 20
  • 10.3390/sym11111393
Self-Supervised Contextual Data Augmentation for Natural Language Processing
  • Nov 11, 2019
  • Symmetry
  • Dongju Park + 1 more

In this paper, we propose a novel data augmentation method with respect to the target context of the data via self-supervised learning. Instead of looking for the exact synonyms of masked words, the proposed method finds words that can replace the original words considering the context. For self-supervised learning, we can employ the masked language model (MLM), which masks a specific word within a sentence and obtains the original word. The MLM learns the context of a sentence through asymmetrical inputs and outputs. However, without using the existing MLM, we propose a label-masked language model (LMLM) that can include label information for the mask tokens used in the MLM to effectively use the MLM in data with label information. The augmentation method performs self-supervised learning using LMLM and then implements data augmentation through the trained model. We demonstrate that our proposed method improves the classification accuracy of recurrent neural networks and convolutional neural network-based classifiers through several experiments for text classification benchmark datasets, including the Stanford Sentiment Treebank-5 (SST5), the Stanford Sentiment Treebank-2 (SST2), the subjectivity (Subj), the Multi-Perspective Question Answering (MPQA), the Movie Reviews (MR), and the Text Retrieval Conference (TREC) datasets. In addition, since the proposed method does not use external data, it can eliminate the time spent collecting external data, or pre-training using external data.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 23
  • 10.5715/jnlp.27.683
Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction
  • Sep 15, 2020
  • Journal of Natural Language Processing
  • Masahiro Kaneko

This paper investigates how to effectively incorporate a pre-trained masked language model (MLM), such as BERT, into an encoder-decoder (EncDec) model for grammatical error correction (GEC). The answer to this question is not as straightforward as one might expect because the previous common methods for incorporating a MLM into an EncDec model have potential drawbacks when applied to GEC. For example, the distribution of the inputs to a GEC model can be considerably different (erroneous, clumsy, etc.) from that of the corpora used for pre-training MLMs; however, this issue is not addressed in the previous methods. Our experiments show that our proposed method, where we first fine-tune a MLM with a given GEC corpus and then use the output of the fine-tuned MLM as additional features in the GEC model, maximizes the benefit of the MLM. The best-performing model achieves state-of-the-art performances on the BEA-2019 and CoNLL-2014 benchmarks. Our code is publicly available at: https://github.com/kanekomasahiro/bert-gec.

  • PDF Download Icon
  • Conference Article
  • Cite Count Icon 137
  • 10.18653/v1/2020.acl-main.391
Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction
  • Jan 1, 2020
  • Masahiro Kaneko + 4 more

This paper investigates how to effectively incorporate a pre-trained masked language model (MLM), such as BERT, into an encoder-decoder (EncDec) model for grammatical error correction (GEC). The answer to this question is not as straightforward as one might expect because the previous common methods for incorporating a MLM into an EncDec model have potential drawbacks when applied to GEC. For example, the distribution of the inputs to a GEC model can be considerably different (erroneous, clumsy, etc.) from that of the corpora used for pre-training MLMs; however, this issue is not addressed in the previous methods. Our experiments show that our proposed method, where we first fine-tune a MLM with a given GEC corpus and then use the output of the fine-tuned MLM as additional features in the GEC model, maximizes the benefit of the MLM. The best-performing model achieves state-of-the-art performances on the BEA-2019 and CoNLL-2014 benchmarks. Our code is publicly available at: https://github.com/kanekomasahiro/bert-gec.

  • Research Article
  • Cite Count Icon 22
  • 10.1609/aaai.v34i05.6265
An Iterative Polishing Framework Based on Quality Aware Masked Language Model for Chinese Poetry Generation
  • Apr 3, 2020
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Liming Deng + 7 more

Owing to its unique literal and aesthetical characteristics, automatic generation of Chinese poetry is still challenging in Artificial Intelligence, which can hardly be straightforwardly realized by end-to-end methods. In this paper, we propose a novel iterative polishing framework for highly qualified Chinese poetry generation. In the first stage, an encoder-decoder structure is utilized to generate a poem draft. Afterwards, our proposed Quality-Aware Masked Language Model (QA-MLM) is employed to polish the draft towards higher quality in terms of linguistics and literalness. Based on a multi-task learning scheme, QA-MLM is able to determine whether polishing is needed based on the poem draft. Furthermore, QA-MLM is able to localize improper characters of the poem draft and substitute with newly predicted ones accordingly. Benefited from the masked language model structure, QA-MLM incorporates global context information into the polishing process, which can obtain more appropriate polishing results than the unidirectional sequential decoding. Moreover, the iterative polishing process will be terminated automatically when QA-MLM regards the processed poem as a qualified one. Both human and automatic evaluation have been conducted, and the results demonstrate that our approach is effective to improve the performance of encoder-decoder structure.

  • Video Transcripts
  • 10.48448/y0ft-3662
Euphemistic Phrase Detection by Masked Language Model
  • Oct 23, 2021
  • Underline Science Inc.
  • Suma Bhat + 1 more

It is a well-known approach for fringe groups and organizations to use euphemisms---ordinary-sounding and innocent-looking words with a secret meaning---to conceal what they are discussing. For instance, drug dealers often use "pot" for marijuana and "avocado" for heroin. From a social media content moderation perspective, though recent advances in NLP have enabled the automatic detection of such single-word euphemisms, no existing work is capable of automatically detecting multi-word euphemisms, such as "blue dream" (marijuana) and "black tar" (heroin). Our paper tackles the problem of euphemistic phrase detection without human effort for the first time, as far as we are aware. We first perform phrase mining on a raw text corpus (e.g., social media posts) to extract quality phrases. Then, we utilize word embedding similarities to select a set of euphemistic phrase candidates. Finally, we rank those candidates by a masked language model---SpanBERT. Compared to strong baselines, we report 20-50% higher detection accuracies using our algorithm for detecting euphemistic phrases.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant