Transformer Architecture Research Articles

Removing redundant parameters and computations before the model training has attracted a great interest as it can effectively reduce the storage space of the model, speed up the training and inference of the model, and save energy consumption during the running of the model. In addition, the simplification of deep neural network models can enable high-performance network models to be deployed to resource-constrained edge devices, thus promoting the development of the intelligent world. However, current pruning at initialization methods exhibit poor performance at extreme sparsity. In order to improve the performance of the model under extreme sparsity, this paper proposes a dual-grained lightweight strategy-TEDEPR. This is the first time that TEDEPR has used tensor theory in the pruning at initialization method to optimize the structure of a sparse sub-network model and improve its performance. Specifically, first, at the coarse-grained level, we represent the weight matrix or weight tensor of the model as a low-rank tensor decomposition form and use multi-step chain operations to enhance the feature extraction capability of the base module to construct a low-rank compact network model. Second, unimportant weights are pruned at a fine-grained level based on the trainability of the weights in the low-rank model before the training of the model, resulting in the final compressed model. To evaluate the superiority of TEDEPR, we conducted extensive experiments on MNIST, UCF11, CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet datasets with LeNet, LSTM, VGGNet, ResNet and Transformer architectures, and compared with state-of-the-art methods. The experimental results show that TEDEPR has higher accuracy, faster training and inference, and less storage space than other pruning at initialization methods under extreme sparsity.

Read full abstract

Protein–RNA interactions are essential to many cellular functions, and missense mutations in RNA-binding proteins can disrupt these interactions, often leading to disease. To address this, we developed PRITrans, a specialized computational method aimed at predicting the effects of missense mutations on protein–RNA interactions, which is vital for understanding disease mechanisms and advancing molecular biology research. PRITrans is a novel deep learning model designed to predict the effects of missense mutations on protein–RNA interactions, which employs a Transformer architecture enhanced with multiscale convolution modules for comprehensive feature extraction. Its primary innovation lies in integrating protein language model embeddings with a deep feature fusion strategy, effectively handling high-dimensional feature representations. By utilizing multi-layer self-attention mechanisms, PRITrans captures nuanced, high-level sequence information, while multiscale convolutions extract features across various depths, thereby enhancing predictive accuracy. Consequently, this architecture enables significant improvements in ΔΔG prediction compared to traditional approaches. We validated PRITrans using three different cross-validation strategies on two newly reconstructed mutation datasets, S315 and S630 (containing 315 forward and 315 reverse mutations). The results consistently demonstrated PRITrans’s strong performance on both datasets. PRITrans demonstrated strong predictive capability, achieving a Pearson correlation coefficient of 0.741 and a root mean square error (RMSE) of 1.168 kcal/mol on the S630 dataset. Moreover, its robust performance extended to independent test sets, achieving a Pearson correlation of 0.699 and an RMSE of 1.592 kcal/mol. These results underscore PRITrans’s potential as a powerful tool for protein-RNA interaction studies. Moreover, when tested against existing prediction methods on an independent dataset, PRITrans showed improved predictive accuracy and robustness.

Read full abstract

Transformer Architecture Research Articles

Related Topics

Articles published on Transformer Architecture

Medical Image Segmentation Review: The Success of U-Net.

OOD-CV-v2 : An Extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images.

Dual-Grained Lightweight Strategy.

A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking.

MT-DSNet: Mix-mask teacher-student strategies and dual dynamic selection plug-in module for fine-grained image recognition

Optical Flow as Spatial-Temporal Attention Learners.

Artionyms and machine learning: auto naming of the paintings

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective.

PRITrans: A Transformer-Based Approach for the Prediction of the Effects of Missense Mutation on Protein–RNA Interactions

Architecture of new generation research institutions under reconstruction conditions

Image enhancement with art design: a visual feature approach with a CNN-transformer fusion model

EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air Sonar Images for Mobile Robotics

Diversifying Multi-Head Attention in the Transformer Model

MolE: a foundation model for molecular graphs using disentangled attention.

Harnessing Artificial Intelligence for Wildlife Conservation

Continuous Evolution of Digital Twins using the DarTwin Notation

DC-Mamba: A Novel Network for Enhanced Remote Sensing Change Detection in Difficult Cases

CovTransformer: A trans former model for SARS-CoV-2 lineage frequency forecasting

LEM-Detector: An Efficient Detector for Photovoltaic Panel Defect Detection

Data-Based Prediction of Redox Potentials via Introducing Chemical Features into the Transformer Architecture.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Transformer Architecture Research Articles

Related Topics

Articles published on Transformer Architecture

Medical Image Segmentation Review: The Success of U-Net.

OOD-CV-v2 : An Extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images.

Dual-Grained Lightweight Strategy.

A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking.

MT-DSNet: Mix-mask teacher-student strategies and dual dynamic selection plug-in module for fine-grained image recognition

Optical Flow as Spatial-Temporal Attention Learners.

Artionyms and machine learning: auto naming of the paintings

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective.

PRITrans: A Transformer-Based Approach for the Prediction of the Effects of Missense Mutation on Protein–RNA Interactions

Architecture of new generation research institutions under reconstruction conditions

Image enhancement with art design: a visual feature approach with a CNN-transformer fusion model

EchoPT: A Pretrained Transformer Architecture that Predicts 2D In-Air Sonar Images for Mobile Robotics

Diversifying Multi-Head Attention in the Transformer Model

MolE: a foundation model for molecular graphs using disentangled attention.

Harnessing Artificial Intelligence for Wildlife Conservation

Continuous Evolution of Digital Twins using the DarTwin Notation

DC-Mamba: A Novel Network for Enhanced Remote Sensing Change Detection in Difficult Cases

CovTransformer: A trans former model for SARS-CoV-2 lineage frequency forecasting

LEM-Detector: An Efficient Detector for Photovoltaic Panel Defect Detection

Data-Based Prediction of Redox Potentials via Introducing Chemical Features into the Transformer Architecture.