Representation Learning Model Research Articles

多媒体数据持续呈现爆发式增长并显现出异源异构的特性，因此跨模态学习领域研究逐渐引起学术和工业界的关注。跨模态表征与生成是跨模态学习的两大核心基础问题。跨模态表征旨在利用多种模态之间的互补性剔除模态之间的冗余，从而获得更为有效的特征表示；跨模态生成则是基于模态之间的语义一致性，实现不同模态数据形式上的相互转换，有助于提高不同模态间的迁移能力。本文系统地分析了国际与国内近年来跨模态表征与生成领域的重要研究进展，包括传统跨模态表征学习、多模态大模型表示学习、图像到文本的跨模态转换和跨模态图像生成。其中，传统跨模态表征学习探讨了跨模态统一表征和跨模态协同表征，多模态大模型表示学习探讨了基于Transformer的模型研究，图像到文本的跨模态转换探讨了图像视频的语义描述、视频字幕语义分析和视觉问答等领域的发展，跨模态图像生成从不同模态信息的跨模态联合表示方法、图像的跨模态生成技术和基于预训练的特定域图像生成阐述了跨模态生成方面的进展。本文详细综述了上述各个子领域研究的挑战性，对比了国内外研究方面的进展情况，梳理了发展脉络和学术研究的前沿动态。最后，根据上述分析展望了跨模态表征与生成的发展趋势和突破口。;Nowadays, with the booming of multimedia data, the character of multi-source and multi-modality of data has become a challenging problem in multimedia research. Its representation and generation can be as two key factors in cross-modal learning research. Cross-modal representation studies feature learning and information integration methods using multi-modal data. To get more effective feature representation, multimodality-between mutual benefits are required to be strengthened. Cross-modal generation is focused on the knowledge transfer mechanism across modalities. The modals-between semantic consistency can be used to realize data-interchangeable profiles of different modals. It is beneficial to improve modalities-between migrating ability. The literature review in cross-modal representation and generation are critically analyzed on the aspect of 1) traditional cross-modal representation learning, 2) big model for cross-modal representation learning, 3) image-to-text cross-modal conversion, joint representation, and 4) cross-modal image generation. Traditional cross-modal representation has two categories:joint representation and coordinated representation. Joint representation can yield multiple single-modal information to the joint representation space when each of single-modal information is processed through the coordinated representations, and cross-modal representations can be learnt mutually in terms of similarity constraints. Deep neural networks(DNNs) based self-supervised learning ability are activated to deal with largescale unlabeled data, especially for the Transformer-based methods. To enrich the supervised learning paradigm, the pretrained large models can yield large-scale unlabeled data to learn training, and a downstream tasks-derived small amount of labeled data is used for model fine-tuning. The pre-trained model has better versatility and transfering ability compared to the trained model for specific tasks, and the fine-tuned model can be used to optimize downstream tasks as well. The developmentof cross-modal synthesis(a.k.a. image caption or video caption) methods have been summarized, including end-toend, semantic-based, and stylize-based methods. In addition, current situation of cross-modal conversion between image and text has beenanalyzed, including image caption, video caption, and visual question answering. The cross-modal generation methods are summarized as well in relevance to the joint representation of cross-modal information, image generation, text-image cross-modal generation, and cross-modal generation based on pre-trained models. In recent years, generative adversarial networks(GANs) and denoising diffusion probabilistic models(DDPMs) have been faciliating in crossmodal generation tasks. Thanks to the strong adaptability and generation ability of DDPM models, cross-modal generation research can be developed and the constraints of vulnerable textures are optimized to a certain extent. The growth of GAN-based and DDPM-based methods are summarized and analyzed further.

Read full abstract

This study proposes the first fully deep learning-based structural response intelligent computing framework for civil engineering. For the first time, from the data side to the model side, the structural information of the structure itself and any loading system is comprehensively considered, which can be applied to materials, components, and even structures, system and other multi-level mechanical response prediction problems. First, according to the characteristics of structural calculation scenarios, a unified data interface mode for structural static characteristics is formulated, which preserves the original structural information input and effectively reduces manual intervention. On this basis, an attention mechanism and a deep cross network are introduced, and a structural static feature representation learning model PADCN is proposed, which can take into account the memory and generalization of structural static features, and mine the coupling relationship of different structural information. Then, the PADCN model is integrated with the dynamic feature prediction model Mechformer and connected with the designed general data interface to form an end-to-end data-driven structural response intelligent computing framework. In order to verify the validity of the framework, numerical experiments were carried out with the steel plate shear wall structure as the carrier, in which a data augmentation algorithm suitable for the field of structural calculation was proposed to alleviate the problem of lack of structural engineering data. The results show that the deep learning model based on this framework successfully predicts the whole-process nonlinear response of specimens with different structures, the simulation accuracy is better than that of the fine finite element model, and the computational efficiency exceeds the traditional numerical method by more than 1000 times, achieving a qualitative improvement. It is proven that the intelligent computing framework has excellent accuracy and efficiency.

Read full abstract

Representation Learning Model Research Articles

Related Topics

Articles published on Representation Learning Model

Cross-modal representation learning and generation

Improving Link Prediction in Network Representation Learning with Feature Fusion and Local Outlier Factor

Discriminative Local Representation Learning for Cross-Modality Visible-Thermal Person Re-Identification

Road-Type Classification with Deep AutoEncoder.

Future Event Prediction Based on Temporal Knowledge Graph Embedding

Cyber-Attacks in IoT-enabled Cyber-physical Systems

Optimal Recommendation Models Based on Knowledge Representation Learning and Graph Attention Networks

TransH-RA: A Learning Model of Knowledge Representation by Hyperplane Projection and Relational Attributes

Providing Post-Hoc Explanation for Node Representation Learning Models Through Inductive Conformal Predictions

A Novel Deep Learning Representation for Industrial Control System Data

Design of a Visual Training System for Software Engineering Education Based on Knowledge Graphs

GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest

SimH: A Novel Representation Learning Model With Activation and Projection Mechanisms for COVID-19 Knowledge Bases.

Multi-Attribute Discriminative Representation Learning for Prediction of Adverse Drug-Drug Interaction.

RLFDDA: a meta-path based graph representation learning model for drug–disease association prediction

LIMU-BERT

Corresponding Intelligent Calculation of the Whole Process of Building Civil Engineering Structure Based on Deep Learning

Modeling Trajectories Obtained from External Sensors for Location Prediction via NLP Approaches.

An Explainable Artificial-Intelligence-Based CNN Model for Knowledge Extraction From the Social Internet of Things: Proposing a New Model

Effective Collaborative Representation Learning for Multilabel Text Categorization.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Representation Learning Model Research Articles

Related Topics

Articles published on Representation Learning Model

Cross-modal representation learning and generation

Improving Link Prediction in Network Representation Learning with Feature Fusion and Local Outlier Factor

Discriminative Local Representation Learning for Cross-Modality Visible-Thermal Person Re-Identification

Road-Type Classification with Deep AutoEncoder.

Future Event Prediction Based on Temporal Knowledge Graph Embedding

Cyber-Attacks in IoT-enabled Cyber-physical Systems

Optimal Recommendation Models Based on Knowledge Representation Learning and Graph Attention Networks

TransH-RA: A Learning Model of Knowledge Representation by Hyperplane Projection and Relational Attributes

Providing Post-Hoc Explanation for Node Representation Learning Models Through Inductive Conformal Predictions

A Novel Deep Learning Representation for Industrial Control System Data

Design of a Visual Training System for Software Engineering Education Based on Knowledge Graphs

GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest

SimH: A Novel Representation Learning Model With Activation and Projection Mechanisms for COVID-19 Knowledge Bases.

Multi-Attribute Discriminative Representation Learning for Prediction of Adverse Drug-Drug Interaction.

RLFDDA: a meta-path based graph representation learning model for drug–disease association prediction

LIMU-BERT

Corresponding Intelligent Calculation of the Whole Process of Building Civil Engineering Structure Based on Deep Learning

Modeling Trajectories Obtained from External Sensors for Location Prediction via NLP Approaches.

An Explainable Artificial-Intelligence-Based CNN Model for Knowledge Extraction From the Social Internet of Things: Proposing a New Model

Effective Collaborative Representation Learning for Multilabel Text Categorization.