Abstract

After analyzing index structure and access mode of an open source search engine-ASPSeek,this paper gave an Abstract definition of inverted index.In order to solve the difficulties of index updating and the efficiency issues caused by directly accessing index through file caching of operating system in ASPSeek,considering the characteristics of 1.25 million Chinese agricultural Web pages,this article proposed a new blocking index storage scheme with a buffer mechanism which was based on CLOCK replacement algorithm.The experimental results show that the new scheme is more efficient than ASPSeek whether the buffer system is disabled or enabled.When the buffer system got enabled and 160 thousand Chinese terms or 50 thousand high-frequency Chinese terms were used as a test set,the retrieval time of new scheme tended to be a constant after one million accesses.Even when using entire 827 309 terms as a test set,the retrieval time of new scheme began to converge after two million accesses.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call