Abstract

Query Rewriting (QR) is proposed to solve the problem of the word mismatch between queries and documents in Web search. Existing approaches usually model QR with an end-to-end sequence-to-sequence (seq2seq) model. The state-of-the-art Transformer-based models can effectively learn textual semantics from user session logs, but they often ignore users’ geographic location information that is crucial for the Point-of-Interest (POI) search of map services. In this paper, we proposed a pre-training model, called Geo-BERT, to integrate semantics and geographic information in the pre-trained representations of POIs. Firstly, we simulate POI distribution in the real world as a graph, in which nodes represent POIs and multiple geographic granularities. Then we use graph representation learning methods to get geographic representations. Finally, we train a BERT-like pre-training model with text and POIs’ graph embeddings to get an integrated representation of both geographic and semantic information, and apply it in the QR of POI search. The proposed model achieves excellent accuracy on a wide range of real-world datasets of map services.

Highlights

  • BackgroundQueries in POI search may contain the administrative region information, e.g. city, district and Usually, incorporating external knowledge could road, so we consider constructing a fine-grained enhance the performance of NLP tasks(Liu et al, geographic graph

  • Queries in POI search may contain the administrative region information, e.g. city, district and Usually, incorporating external knowledge could road, so we consider constructing a fine-grained enhance the performance of NLP tasks(Liu et al, geographic graph.2020; Zhou et al, 2020; Han et al, 2018)

  • Capturing the geographic information corresponding to the query becomes crucial to Query Rewriting (QR) tasks in POI search

Read more

Summary

Background

Queries in POI search may contain the administrative region information, e.g. city, district and Usually, incorporating external knowledge could road, so we consider constructing a fine-grained enhance the performance of NLP tasks(Liu et al, geographic graph. The graph is based on the neighborhood relationship between POIs, and fuses the inclusion relationship between administrative regions. It is unweighted because two following reasons: (1) we have no idea about the path between two POIs for the lack of complete map information; (2) we hope to simplify the graph to make the learned representations more robust. We use graph embedding algorithms, e.g. node2vec (Grover and Leskovec, 2016), to get the node representations that contain geographic information. Information of both tokens and geographic entities. σ(·) is a non-linear activation function, which is set as GELU (Hendrycks and Gimpel, 2016) in the experiments

Geo-BERT Architecture
Dataset
Results
QR Performance
Ablation Study
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call