Detecting the intention of users in the information retrieval system is a critical task affecting the user experience. Confronting the challenge of retrieval datasets can be extremely large, complex, and heterogeneous correlated, we propose a framework named DAHH, which utilizes the Divide and Aggregate strategy to incorporate Heterogeneous Hypergraph for large-scale user retrieval intention detection. In the proposed framework, the large-scale retrieval dataset is first divided into multiple subsets with a solvable scale of samples for the model. Then consistent rules are used in all the subsets to construct heterogeneous hypergraphs, and the hypergraph convolution operation is conducted independently on each hypergraph to generate local high-order topologic enhanced vertex representations. Aggregation is applied to correlate all the local hypergraphs at different levels to generate global high-order heterogeneous enhanced representations. In the end, local and global high-order representations are combined to predict user intentions. The proposed framework is capable of dealing with heterogeneous and high-order correlations among the large-scale user retrieval data. We conduct extensive experiments on our collected real-world search engine dataset with millions of samples as well as several widely used public datasets to demonstrate the across-the-board superior performance of DAHH to other state-of-the-art models.
Read full abstract