Both the WWW research community and industry have shown increasing interests in not just finding relevant documents, but specific objects or entities to satisfy more sophisticated user information needs. TREC launched an Entity Track in 2009 to investigate the task of related entity finding. This paper proposes two novel probabilistic models to integrate several components into a unified modeling process. In particular, the type matching component can characterize the degree of matching between the expected entity type that is inferred from query and the candidate entity type that is inferred from entity profile. Another important component can incorporate prior knowledge about entities into the retrieval process. The main difference of the two models is that the second model explicitly considers the effect of source entity while the first one does not. A comprehensive set of experiments were conducted on the TREC Entity Track testbeds from 2009 to 2011 with careful design to show the contributions of individual components. The results demonstrate that both the type matching component and the entity prior modeling component can effectively boost the entity retrieval performance. Furthermore, the second model performs better than the first one in all the settings, indicating the benefits of explicitly modeling source entity in related entity finding. Both models generate better or competitive results than the state-of-the-art results in the TREC REF tasks. In addition, the proposed unified probabilistic approach is applied to the TREC Entity List Completion task and also demonstrates good performance.
Read full abstract