Problem Of Language Translation Research Articles

The ‘inverse problem’ of mass spectrometric molecular identification (‘given a mass spectrum, calculate/predict the 2D structure of the molecule whence it came’) is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem (‘calculate a small molecule’s likely fragmentation and hence at least some of its mass spectrum from its structure alone’) is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the ‘translation’ a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the ‘true’ molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are ‘similar’ to the top hit. In addition to using the ‘top hits’ directly, we can produce a rank order of these by ‘round-tripping’ candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower, including those in the last CASMI challenge (for which the results are known), getting 49/93 (53%) precisely correct. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. It seems to act as a Las Vegas algorithm, in that it either gives the correct answer or simply states that it cannot find one. The ability to create and to ‘learn’ millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.

Abstract In the history of artificial neural networks, LSTMs have proved to be a high-performance architecture at sequential data learning. Although LSTMs are remarkable in learning sequential data but are limited in their ability to learn long-term dependencies and representation of certain data structures because of the lack of external memory. In this paper, we tackled two main tasks, one is language translation and other is image captioning. We approached the problem of language translation by leveraging the capabilities of the recently developed DNC architectures. Here we modified the DNC architecture by including dual neural controllers instead of one and an external memory module. Inside our controller, we employed a neural network with memory-augmentation which differs from the original differentiable neural computer, we implemented a dual controller’s system in which one controller is for encoding the query sequence whereas another controller is for decoding the translated sequences. During the encoding cycle, new inputs are read and the memory is updated accordingly. In the decoding cycle, the memory is protected from any writing from the decoding controller. Thus, the decoder phase generates a translated sequence at a time step. Therefore, the proposed dual controller neural network with memory-augmentation is then trained and tested on the Europarl dataset. For the image captioning task, our architecture is inspired by an end-to-end image captioning model where CNN’s output is passed to RNN as input only once and the RNN generates words depending on the input. We trained our DNC captioning model on 2015 MSCOCO dataset. In the end, we compared and shows the superiority of our architecture as compared to conventionally used LSTM and NTM architectures.

Problem Of Language Translation Research Articles

Related Topics

Articles published on Problem Of Language Translation

MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra.

Advances in machine translation for sign language: approaches, limitations, and challenges

Substructure-based neural machine translation for retrosynthetic prediction

Employing Differentiable Neural Computers for Image Captioning and Neural Machine Translation

A Better Gauging Model for the Evaluation of Automatic Machine Translation of English – Hindi Language

Identifying Semantics in Clinical Reports Using Neural Machine Translation

Neural Sign Language Translation Based on Human Keypoint Estimation

Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks.

ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network.

Cognitive testing of physical activity and acculturation questions in recent and long-term Latino immigrants

The Proper Place of Men and Machines inLanguage Translation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Problem Of Language Translation Research Articles

Related Topics

Articles published on Problem Of Language Translation

MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra.

Advances in machine translation for sign language: approaches, limitations, and challenges

Substructure-based neural machine translation for retrosynthetic prediction

Employing Differentiable Neural Computers for Image Captioning and Neural Machine Translation

A Better Gauging Model for the Evaluation of Automatic Machine Translation of English – Hindi Language

Identifying Semantics in Clinical Reports Using Neural Machine Translation

Neural Sign Language Translation Based on Human Keypoint Estimation

Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks.

ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network.

Cognitive testing of physical activity and acculturation questions in recent and long-term Latino immigrants

The Proper Place of Men and Machines inLanguage Translation