Improving Unsupervised Dependency Parsing with Knowledge from Query Logs

Xiuming Qiao,Tiejun Zhao,Hailong Cao

doi:10.1145/2903720

Abstract

Unsupervised dependency parsing becomes more and more popular in recent years because it does not need expensive annotations, such as treebanks, which are required for supervised and semi-supervised dependency parsing. However, its accuracy is still far below that of supervised dependency parsers, partly due to the fact that their parsing model is insufficient to capture linguistic phenomena underlying texts. The performance for unsupervised dependency parsing can be improved by mining knowledge from the texts and by incorporating it into the model. In this article, syntactic knowledge is acquired from query logs to help estimate better probabilities in dependency models with valence. The proposed method is language independent and obtains an improvement of 4.1% unlabeled accuracy on the Penn Chinese Treebank by utilizing additional dependency relations from the Sogou query logs and Baidu query logs. Morever, experiments show that the proposed model achieves improvements of 8.07% on CoNLL 2007 English using the AOL query logs. We believe query logs are useful sources of syntactic knowledge for many natural language processing (NLP) tasks.

Full Text