Abstract

In this paper, we propose a page clipping search engine based on page discussion topics. Compared to other search engines, our search engine uses the page discussion topic instead of the search engine results page as the main result. After the user selects the topic of interest, our search engine will clip the relevant pages according to the selected topic and produce an integrated page result. The advantage of this topic-based integration page result is that the user can reduce the time it takes to decide whether the page content is relevant. Our results consist of two parts: the query-related discussion topics and the clipping results for relevant pages. We first use an adjusted N-gram language model and a hash method to produce discussion topics. At the same time, we use the idea of binary coding and mathematical set to organize related topics into a hierarchical topic tree with parent–child relationship. Next, we use a cost-effective genetic algorithm to produce the relevant page clipping results. This study has the following three advantages. The first is that we can find multiple clustering relationships, that is, a child topic can appear simultaneously in multiple parent topics. The second is that we propose a good topic generation method, that is, we cannot only produce better quality topics, but also produce the topic tree in a linear time. The third is that we propose a good clipping generation method, that is, we cannot only produce better quality clippings, but also produce a cost-effective solution.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.