A Joint Learning Model with Variational Interaction for Multilingual Program Translation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Programs implemented in various programming languages form the foundation of software applications. To alleviate the burden of program migration and facilitate the development of software systems, automated program translation across languages has garnered significant attention. Previous approaches primarily focus on pairwise translation paradigms, learning translation between pairs of languages using bilingual parallel data. However, parallel data is difficult to collect for some language pairs, and the distribution of program semantics across languages can shift, posing challenges for pairwise program translation. In this paper, we argue that jointly learning a unified model to translate code across multiple programming languages is superior to separately learning from bilingual parallel data. We propose Variational Interaction for Multilingual Program Translation (VIM-PT), a disentanglement-based generative approach that jointly trains a unified model for multilingual program translation across multiple languages. VIM-PT disentangles code into language-shared and language-specific features, using variational inference and interaction information with a novel lower bound, then achieves program translation through conditional generation. VIM-PT demonstrates four advantages: 1) captures language-shared information more accurately from various implementations and improves the quality of multilingual program translation, 2) mines and leverages the capability of non-parallel data, 3) addresses the distribution shift of program semantics across languages, 4) and serves as a unified model, reducing deployment complexity.

Similar Papers
  • Research Article
  • Cite Count Icon 4
  • 10.1155/2022/2146236
Leveraging a Joint learning Model to Extract Mixture Symptom Mentions from Traditional Chinese Medicine Clinical Notes.
  • Jan 1, 2022
  • BioMed Research International
  • Yuxin Sun + 10 more

This paper addresses the mixture symptom mention problem which appears in the structuring of Traditional Chinese Medicine (TCM). We accomplished this by disassembling mixture symptom mentions with entity relation extraction. Over 2,200 clinical notes were annotated to construct the training set. Then, an end-to-end joint learning model was established to extract the entity relations. A joint model leveraging a multihead mechanism was proposed to deal with the problem of relation overlapping. A pretrained transformer encoder was adopted to capture context information. Compared with the entity extraction pipeline, the constructed joint learning model was superior in recall, precision, and F1 measures, at 0.822, 0.825, and 0.818, respectively, 14% higher than the baseline model. The joint learning model could automatically extract features without any extra natural language processing tools. This is efficient in the disassembling of mixture symptom mentions. Furthermore, this superior performance at identifying overlapping relations could benefit the reassembling of separated symptom entities downstream.

  • Research Article
  • Cite Count Icon 6
  • 10.47839/ijc.21.2.2595
Sound Context Classification based on Joint Learning Model and Multi-Spectrogram Features
  • Jun 30, 2022
  • International Journal of Computing
  • Dat Ngo + 5 more

This article presents a deep learning framework applied for Acoustic Scene Classification (ASC), the task of classifying different environments from the sounds they produce. To successfully develop the framework, we firstly carry out a comprehensive analysis of spectrogram representation extracted from sound scene input, then propose the best multi-spectrogram combination for front-end feature extraction. In terms of back-end classification, we propose a novel joint learning model using a parallel architecture of Convolutional Neural Network (CNN) and Convolutional Recurrent Neural Network (C-RNN), which is able to learn efficiently both spatial features and temporal sequences of a spectrogram input. The experimental results have proved our proposed framework general and robust for ASC tasks by three main contributions. Firstly, the most effective spectrogram combination is indicated for specific datasets that none of publication previously analyzed. Secondly, our joint learning architecture of CNN and C-RNN achieves better performance compared with the CNN only which is proposed for the baseline in this paper. Finally, our framework achieves competitive performance compared with the state-of-the-art systems on various benchmark datasets of IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 1, 2017 Task 1, 2018 Task 1A & 1B, and LITIS Rouen.

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.neucom.2021.02.036
A joint learning model for click-through prediction in display advertising
  • Mar 2, 2021
  • Neurocomputing
  • Mengjuan Liu + 5 more

A joint learning model for click-through prediction in display advertising

  • Book Chapter
  • Cite Count Icon 154
  • 10.1007/978-3-319-24553-9_63
Automatic Localization and Identification of Vertebrae in Spine CT via a Joint Learning Model with Deep Neural Networks
  • Jan 1, 2015
  • Hao Chen + 6 more

Accurate localization and identification of vertebrae in 3D spinal images is essential for many clinical tasks. However, automatic localization and identification of vertebrae remains challenging due to similar appearance of vertebrae, abnormal pathological curvatures and image artifacts induced by surgical implants. Traditional methods relying on hand-crafted low level features and/or a priori knowledge usually fail to overcome these challenges on arbitrary CT scans. We present a robust and efficient approach to automatically locating and identifying vertebrae in 3D CT volumes by exploiting high level feature representations with deep convolutional neural network CNN. A novel joint learning model with CNN J-CNN is proposed by considering both the appearance of vertebrae and the pairwise conditional dependency of neighboring vertebrae. The J-CNN can effectively identify the type of vertebra and eliminate false detections based on a set of coarse vertebral centroids generated from a random forest classifier. Furthermore, the predicted centroids are refined by a shape regression model. Our approach was quantitatively evaluated on the dataset of MICCAI 2014 Computational Challenge on Vertebrae Localization and Identification. Compared with the state-of-the-art methodi¾?[1], our approach achieved a large margin with 10.12% improvement of the identification rate and smaller localization errors.

  • Supplementary Content
  • 10.17638/03032928
Learning Density Models via Structured Latent Variables
  • Feb 15, 2019
  • University of Liverpool
  • Xitong Yang

As one principal approach to machine learning and cognitive science, the probabilistic framework has been continuously developed both theoretically and practically. Learning a probabilistic model can be thought of as inferring plausible models to explain observed data. The learning process exploits random variables as building blocks which are held together with probabilistic relationships. The key idea behind latent variable models is to introduce latent variables as powerful attributes (setting/instrument) to reveal data structures and explore underlying features which can sensitively describe the real-world data. The classical research approaches engage shallow architectures, including latent feature models and finite mixtures of latent variable models. Within the classical frameworks, we should make certain assumptions about the form, structure, and distribution of the data. Since the shallow form may not describe the data structures sufficiently, new types of latent structures are promptly developed with the probabilistic frameworks. In this line, three main research interests are sparked, including infinite latent feature models, mixtures of the mixture models, and deep models. This dissertation summarises our work which is advancing the state-of-the-art in both classical and emerging areas. In the first block, a finite latent variable model with the parametric priors is presented for clustering and is further extended into a two-layer mixture model for discrimination. These models embed the dimensionality reduction in their learning tasks by designing a latent structure called common loading. Referred to as the joint learning models, these models attain more appropriate low-dimensional space that better matches the learning task. Meanwhile, the parameters are optimised simultaneously for both the low-dimensional space and model learning. However, these joint learning models must assume the fixed number of features as well as mixtures, which are normally tuned and searched using a trial and error approach. In general, the simpler inference can be performed by fixing more parameters. However, the fixed parameters will limit the flexibility of models, and false assumptions could even derive incorrect inferences from the data. Thus, a richer model is allowed for reducing the number of assumptions. Therefore an infinite tri-factorisation structure is proposed with non-parametric priors in the second block. This model can automatically determine an optimal number of features and leverage the interrelation between data and features. In the final block, we introduce how to promote the shallow latent structures model to deep structures to handle the richer structured data. This part includes two tasks: one is a layer-wise-based model, another is a deep autoencoder-based model. In a deep density model, the knowledge of cognitive agents can be modelled using more complex probability distributions. At the same time, inference and parameter computation procedure are straightforward by using a greedy layer-wise algorithm. The deep autoencoder-based joint learning model is trained in an end-to-end fashion which does not require pre-training of the autoencoder network. Also, it can be optimised by standard backpropagation without the inference of maximum a posteriori. Deep generative models are much more efficient than their shallow architectures for unsupervised and supervised density learning tasks. Furthermore, they can also be developed and used in various practical applications.

  • Research Article
  • Cite Count Icon 6
  • 10.13053/cys-23-3-3247
Joint Learning of Named Entity Recognition and Dependency Parsing using Separate Datasets
  • Oct 7, 2019
  • Computación y Sistemas
  • Arda Akdemir + 1 more

Joint learning of different NLP-related tasks is an emerging research eld in Machine Learning. Yet, most of the recent models proposed on joint learning require a dataset that is annotated jointly for all the tasks involved. Such datasets are available only for frequently used languages. In this paper, we propose a novel BiLSTM CRF based joint learning model for dependency parsing and named entity recognition tasks, which has not been employed before for Turkish to the best of our knowledge. This enables joint learning of various tasks for languages that have limited amount of annotated datasets. Our model, tested on a frequently used NER dataset for Turkish, has comparable results with the state-of-the-art systems. We also show that our proposed model out performs the joint learning model which uses a single dataset.

  • Research Article
  • Cite Count Icon 40
  • 10.1109/tnnls.2023.3264587
An End-to-End Framework for Joint Denoising and Classification of Hyperspectral Images.
  • Jul 1, 2023
  • IEEE Transactions on Neural Networks and Learning Systems
  • Xian Li + 3 more

Image denoising and classification are typically conducted separately and sequentially according to their respective objectives. In such a setup, where the two tasks are decoupled, the denoising operation does not optimally serve the classification task and sometimes even deteriorates it. We introduce here a unified deep learning framework for joint denoising and classification of high-dimensional images, and we particularly apply it in the framework of hyperspectral imaging. Earlier works on joint image denoising and classification are very scarce, and to the best of our knowledge, no deep learning models were proposed or studied yet for this type of multitask image processing. A key component in our joint learning model is a compound loss function, designed in such a way that the denoising and classification operations benefit each other iteratively during the learning process. Hyperspectral images (HSIs) are particularly challenging for both denoising and classification due to their high dimensionality and varying noise statistics across the bands. We argue that a well-designed end-to-end deep learning framework for joint denoising and classification is superior to current deep learning approaches for processing HSI data, and we substantiate this by results on real HSI images in remote sensing. We experimentally show that the proposed joint learning framework substantially improves the classification performance compared to the common deep learning approaches in HSI processing, and as a by-product, the denoising results are enhanced as well, especially in terms of the semantic content, benefiting from the classification.

  • Research Article
  • Cite Count Icon 58
  • 10.1609/aaai.v34i05.6266
Joint Learning of Answer Selection and Answer Summary Generation in Community Question Answering
  • Apr 3, 2020
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Yang Deng + 6 more

Community question answering (CQA) gains increasing popularity in both academy and industry recently. However, the redundancy and lengthiness issues of crowdsourced answers limit the performance of answer selection and lead to reading difficulties and misunderstandings for community users. To solve these problems, we tackle the tasks of answer selection and answer summary generation in CQA with a novel joint learning model. Specifically, we design a question-driven pointer-generator network, which exploits the correlation information between question-answer pairs to aid in attending the essential information when generating answer summaries. Meanwhile, we leverage the answer summaries to alleviate noise in original lengthy answers when ranking the relevancy degrees of question-answer pairs. In addition, we construct a new large-scale CQA corpus, WikiHowQA, which contains long answers for answer selection as well as reference summaries for answer summarization. The experimental results show that the joint learning method can effectively address the answer redundancy issue in CQA and achieves state-of-the-art results on both answer selection and text summarization tasks. Furthermore, the proposed model is shown to be of great transferring ability and applicability for resource-poor CQA tasks, which lack of reference answer summaries.

  • Conference Article
  • Cite Count Icon 81
  • 10.18653/v1/d18-1504
Joint Learning for Targeted Sentiment Analysis
  • Jan 1, 2018
  • Dehong Ma + 2 more

Targeted sentiment analysis (TSA) aims at extracting targets and classifying their sentiment classes. Previous works only exploit word embeddings as features and do not explore more potentials of neural networks when jointly learning the two tasks. In this paper, we carefully design the hierarchical stack bidirectional gated recurrent units (HSBi-GRU) model to learn abstract features for both tasks, and we propose a HSBi-GRU based joint model which allows the target label to have influence on their sentiment label. Experimental results on two datasets show that our joint learning model can outperform other baselines and demonstrate the effectiveness of HSBi-GRU in learning abstract features.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 24
  • 10.1186/s12859-021-04520-x
JLAN: medical code prediction via joint learning attention networks and denoising mechanism
  • Dec 1, 2021
  • BMC Bioinformatics
  • Xingwang Li + 5 more

BackgroundClinical notes are documents that contain detailed information about the health status of patients. Medical codes generally accompany them. However, the manual diagnosis is costly and error-prone. Moreover, large datasets in clinical diagnosis are susceptible to noise labels because of erroneous manual annotation. Therefore, machine learning has been utilized to perform automatic diagnoses. Previous state-of-the-art (SOTA) models used convolutional neural networks to build document representations for predicting medical codes. However, the clinical notes are usually long-tailed. Moreover, most models fail to deal with the noise during code allocation. Therefore, denoising mechanism and long-tailed classification are the keys to automated coding at scale.ResultsIn this paper, a new joint learning model is proposed to extend our attention model for predicting medical codes from clinical notes. On the MIMIC-III-50 dataset, our model outperforms all the baselines and SOTA models in all quantitative metrics. On the MIMIC-III-full dataset, our model outperforms in the macro-F1, micro-F1, macro-AUC, and precision at eight compared to the most advanced models. In addition, after introducing the denoising mechanism, the convergence speed of the model becomes faster, and the loss of the model is reduced overall.ConclusionsThe innovations of our model are threefold: firstly, the code-specific representation can be identified by adopted the self-attention mechanism and the label attention mechanism. Secondly, the performance of the long-tailed distributions can be boosted by introducing the joint learning mechanism. Thirdly, the denoising mechanism is suitable for reducing the noise effects in medical code prediction. Finally, we evaluate the effectiveness of our model on the widely-used MIMIC-III datasets and achieve new SOTA results.

  • Conference Article
  • Cite Count Icon 79
  • 10.24963/ijcai.2020/611
Multi-View Joint Graph Representation Learning for Urban Region Embedding
  • Jul 1, 2020
  • Mingyang Zhang + 3 more

The increasing amount of urban data enable us to investigate urban dynamics, assist urban planning, and eventually, make our cities more livable and sustainable. In this paper, we focus on learning an embedding space from urban data for urban regions. For the first time, we propose a multi-view joint learning model to learn comprehensive and representative urban region embeddings. We first model different types of region correlations based on both human mobility and inherent region properties. Then, we apply a graph attention mechanism in learning region representations from each view of the built correlations. Moreover, we introduce a joint learning module that boosts the region embedding learning by sharing cross-view information and fuses multi-view embeddings by learning adaptive weights. Finally, we exploit the learned embeddings in the downstream applications of land usage classification and crime prediction in urban areas with real-world data. Extensive experiment results demonstrate that by exploiting our proposed joint learning model, the performance is improved by a large margin on both tasks compared with the state-of-the-art methods.

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.jbi.2023.104318
Joint learning-based causal relation extraction from biomedical literature
  • Feb 11, 2023
  • Journal of Biomedical Informatics
  • Dongling Li + 5 more

Joint learning-based causal relation extraction from biomedical literature

  • Video Transcripts
  • 10.48448/9jk0-t783
CREAD: Combined Resolution of Ellipses and Anaphora in Dialogues
  • May 25, 2021
  • Underline Science Inc.
  • Lin Li + 6 more

Anaphora and ellipses are two common phenomena in dialogues. Without resolving referring expressions and information omission, dialogue systems may fail to generate consistent and coherent responses. Traditionally, anaphora is resolved by coreference resolution and ellipses by query rewrite. In this work, we propose a novel joint learning framework of modeling coreference resolution and query rewriting for complex, multi-turn dialogue understanding. Given an ongoing dialogue between a user and a dialogue assistant, for the user query, our joint learning model first predicts coreference links between the query and the dialogue context, and then generates a self-contained rewritten user query. To evaluate our model, we annotate a dialogue based coreference resolution dataset, MuDoCo, with rewritten queries. Results show that the performance of query rewrite can be substantially boosted (+2.3% F1) with the aid of coreference modeling. Furthermore, our joint model outperforms the state-of-the-art coreference resolution model (+2% F1) on this dataset.

  • Conference Article
  • Cite Count Icon 31
  • 10.3115/v1/n15-1079
Using External Resources and Joint Learning for Bigram Weighting in ILP-Based Multi-Document Summarization
  • Jan 1, 2015
  • Chen Li + 2 more

Some state-of-the-art summarization systems use integer linear programming (ILP) based methods that aim to maximize the important concepts covered in the summary. These concepts are often obtained by selecting bigrams from the documents. In this paper, we improve such bigram based ILP summarization methods from different aspects. First we use syntactic information to select more important bigrams. Second, to estimate the importance of the bigrams, in addition to the internal features based on the test documents (e.g., document frequency, bigram positions), we propose to extract features by leveraging multiple external resources (such as word embedding from additional corpus, Wikipedia, Dbpedia, WordNet, SentiWordNet). The bigram weights are then trained discriminatively in a joint learning model that predicts the bigram weights and selects the summary sentences in the ILP framework at the same time. We demonstrate that our system consistently outperforms the prior ILP method on different TAC data sets, and performs competitively compared to other previously reported best results. We also conducted various analyses to show the contributions of different components.

  • Conference Article
  • 10.1109/cac.2015.7382565
Image hashing basing on joint learning of multi-dimension features
  • Nov 1, 2015
  • Li Huanyu + 2 more

For the problem of image retrieval in computer vision, basing on Principal Component Analysis (PCA) and convolution filtering, a novel image hashing realized by joint learning of multi-dimension features is proposed. As the first step, in order to get the convolution filters, the PCA eigenvectors is learned from the matrix constructed by the patches which are extracted randomly from original images. Next, in order to get the multi-dimension feature expression of the original images, the original images are convolution filtered to several groups which are in accordance with the filter sequence. Then, the hash projection matrix and binary coding are learned by a traditional hashing operator in each dimension respectively. Finally, the hash code of our joint learning model is obtained by merging the grouping binary coding together. Abundant experiment is done to validate the algorithm validity on a widely used dataset called CIFAR-10. The result shows that the algorithm of image hashing proposed in this paper performs well in image retrieval application, compare with the traditional image hashing, there is a certain performance improvement on both precision and recall.

Save Icon
Up Arrow
Open/Close