Abstract

The birth of the Transformer revolutionarily signalled the start of a new epic chapter in the deep learning era. Through an encoder-decoder architecture, including residual connection, multi-head self-attention, etc., it completely reformed the deep models and unified the models used in traditional computer vision (CV) and natural language processing (NLP) problems. In recent years, many papers published have adapted the original Transformer model to better complete tasks in time series analysis, CV, and NLP. In the area of natural language processing, Bidirectional Encoder Representations from Transformers (BERT) employs a two-way transformer structure to learn context-based language representation, whereas Generative Pre-trained Transformer (GPT) employs a one-way transformer but enhances corpus training to enhance the model effect. The Vision Transformer model is the cornerstone of computer vision. It separates the input image into various patches, projects each patch into vectorized features, and then passes the them to Transformer. Based on the idea of the Vision Transformer, Swin Transformer and Biformer further optimized the Transformer and achieved better results. Time series combines the ideas embodied in CV and NLP, and in doing so, improves the specificity and various difficulties of time series problems to lower algorithm complexity and increase prediction accuracy. This article summarizes the uses and improvements of the Transformer in NLP, CV and time series, explores the development history and ideas on algorithm optimization, and predicts the potential developments of Transformer in these three fields.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call