Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications.

Han Xie,Belinda Zeng,Xiang Song,Trishul Chilimbi,Jun Ma,Yi Xu,Da Zheng,Houyu Zhang,Carl Yang,Vassilis N Ioannidis,Sheng Wang,Qing Ping

doi:10.1145/3580305.3599833

Abstract

Model pre-training on large text corpora has been demonstrated effective for various downstream applications in the NLP domain. In the graph mining domain, a similar analogy can be drawn for pre-training graph models on large graphs in the hope of benefiting downstream graph applications, which has also been explored by several recent studies. However, no existing study has ever investigated the pre-training of text plus graph models on large heterogeneous graphs with abundant textual information (a.k.a. large graph corpora) and then fine-tuning the model on different related downstream applications with different graph schemas. To address this problem, we propose a framework of graph-aware language model pre-training (GaLM) on a large graph corpus, which incorporates large language models and graph neural networks, and a variety of fine-tuning methods on downstream applications. We conduct extensive experiments on Amazon's real internal datasets and large public datasets. Comprehensive empirical results and in-depth analysis demonstrate the effectiveness of our proposed methods along with lessons learned.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: KDD : proceedings. International Conference on Knowledge Discovery & Data Mining	Publication Date: Aug 4, 2023
Citations: 6	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications.

Abstract

Talk to us

Similar Papers

More From: KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

Lead the way for us

Similar Papers

Better Few-Shot Text Classification with Pre-trained Language Model
Zheng Chen ... Yunchen Zhang
-
Zheng Chen, et. al.Zheng Chen ... Yunchen Zhang
01 Jan 2020
01 Jan 2020

Efficient Search Mechanism from Large Scale Corpora for Domain-Specific Language Modeling in Speech Recognition
-
International Journal of Engineering and Advanced Technology | VOL. 8
--
30 Aug 2019
International Journal of Engineering and Advanced Technology | VOL. 8

On the comparability of pre-trained language models
...
-
, et. al. ...
25 Jun 2020
25 Jun 2020

An Empirical Study on Pre-trained Embeddings and Language Models for Bot Detection
Andres Garcia-Silva ... José Manuel Gómez-Pérez
-
Andres Garcia-Silva, et. al.Andres Garcia-Silva ... José Manuel Gómez-Pérez
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications.

Abstract

Talk to us

Similar Papers

More From: KDD : proceedings. International Conference on Knowledge Discovery & Data Mining