Representation Of Sequences Research Articles

Retrosynthesis is the task of predicting reactant molecules from a given product molecule and is, important in organic chemistry because the identification of a synthetic path is as demanding as the discovery of new chemical compounds. Recently, the retrosynthesis task has been solved automatically without human expertise using powerful deep learning models. Recent deep models are primarily based on seq2seq or graph neural networks depending on the function of molecular representation, sequence, or graph. Current state-of-the-art models represent a molecule as a graph, but they require joint training with auxiliary prediction tasks, such as the most probable reaction template or reaction center prediction. Furthermore, they require additional labels by experienced chemists, thereby incurring additional cost. Herein, we propose a novel template-free model, i.e., Graph Truncated Attention (GTA), which leverages both sequence and graph representations by inserting graphical information into a seq2seq model. The proposed GTA model masks the self-attention layer using the adjacency matrix of product molecule in the encoder and applies a new loss using atom mapping acquired from an automated algorithm to the cross-attention layer in the decoder. Our model achieves new state-of-the-art records, i.e., exact match top-1 and top-10 accuracies of 51.1% and 81.6% on the USPTO-50k benchmark dataset, respectively, and 46.0% and 70.0% on the USPTO-full dataset, respectively, both without any reaction class information. The GTA model surpasses prior graph-based template-free models by 2% and 7% in terms of the top-1 and top-10 accuracies on the USPTO-50k dataset, respectively, and by over 6% for both the top-1 and top-10 accuracies on the USPTO-full dataset.

Read full abstract

Enhancers are important functional elements in genome sequences. The identification of enhancers is a very challenging task due to the great diversity of enhancer sequences and the flexible localization on genomes. Till now, the interactions between enhancers and genes have not been fully understood yet. To speed up the studies of the regulatory roles of enhancers, computational tools for the prediction of enhancers have emerged in recent years. Especially, thanks to the ENCODE project and the advances of high-throughput experimental techniques, a large amount of experimentally verified enhancers have been annotated on the human genome, which allows large-scale predictions of unknown enhancers using data-driven methods. However, except for human and some model organisms, the validated enhancer annotations are scarce for most species, leading to more difficulties in the computational identification of enhancers for their genomes. In this study, we propose a deep learning-based predictor for enhancers, named CrepHAN, which is featured by a hierarchical attention neural network and word embedding-based representations for DNA sequences. We use the experimentally supported data of the human genome to train the model, and perform experiments on human and other mammals, including mouse, cow and dog. The experimental results show that CrepHAN has more advantages on cross-species predictions, and outperforms the existing models by a large margin. Especially, for human-mouse cross-predictions, the area under the receiver operating characteristic (ROC) curve (AUC) score of ROC curve is increased by 0.033∼0.145 on the combined tissue dataset and 0.032∼0.109 on tissue-specific datasets. bcmi.sjtu.edu.cn/∼yangyang/CrepHAN.html. Supplementary data are available at Bioinformatics online.

Read full abstract

Representation Of Sequences Research Articles

Related Topics

Articles published on Representation Of Sequences

In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins.

Learning the Regulatory Code of Gene Expression.

Grammar guided embedding based Chinese long text sentiment classification

FEGS: a novel feature extraction model for protein sequences and its applications

Multiword units lead to errors of commission in children's spontaneous production: "What corpus data can tell us?*".

Comparative analysis and prediction of nucleosome positioning using integrative feature representation and machine learning algorithms

The graph-based behavior-aware recommendation for interactive news

Sequence learning recodes cortical representations instead of strengthening initial ones.

DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence.

The sequence form of accounting for atrocity

A Lightweight Neural Model for Biomedical Entity Linking

GTA: Graph Truncated Attention for Retrosynthesis

Static-Dynamic Interaction Networks for Offline Signature Verification

GDPNet: Refining Latent Multi-View Graph for Relation Extraction

CrepHAN: cross-species prediction of enhancers by using hierarchical attention networks.

Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction.

Rethinking the ST-GCNs for 3D skeleton-based human action recognition

Real-time crowd behavior recognition in surveillance videos based on deep learning methods

Convolutional neural networks with image representation of amino acid sequences for protein function prediction

Learning audio sequence representations for acoustic event classification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Representation Of Sequences Research Articles

Related Topics

Articles published on Representation Of Sequences

In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins.

Learning the Regulatory Code of Gene Expression.

Grammar guided embedding based Chinese long text sentiment classification

FEGS: a novel feature extraction model for protein sequences and its applications

Multiword units lead to errors of commission in children's spontaneous production: "What corpus data can tell us?*".

Comparative analysis and prediction of nucleosome positioning using integrative feature representation and machine learning algorithms

The graph-based behavior-aware recommendation for interactive news

Sequence learning recodes cortical representations instead of strengthening initial ones.

DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence.

The sequence form of accounting for atrocity

A Lightweight Neural Model for Biomedical Entity Linking

GTA: Graph Truncated Attention for Retrosynthesis

Static-Dynamic Interaction Networks for Offline Signature Verification

GDPNet: Refining Latent Multi-View Graph for Relation Extraction

CrepHAN: cross-species prediction of enhancers by using hierarchical attention networks.

Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction.

Rethinking the ST-GCNs for 3D skeleton-based human action recognition

Real-time crowd behavior recognition in surveillance videos based on deep learning methods

Convolutional neural networks with image representation of amino acid sequences for protein function prediction

Learning audio sequence representations for acoustic event classification