Abstract

Wikipedia's category graph is a network of 300,000 interconnected category labels, and can be a powerful resource for many classification tasks. However, its size and the lack of order can make it difficult to navigate. In this paper, we present a new algorithm to efficiently exploit this graph and accurately rank classification labels given user-specified keywords. We highlight multiple possible variations of this algorithm, and study the impact of these variations on the classification results in order to determine the optimal way to exploit the category graph. We implement our algorithm as the core of a query classification system and demonstrate its reliability using the KDD CUP 2005 and TREC 2007 competitions as benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call