Abstract

Caching is an effective optimization in search engine storage architectures. Many caching algorithms have been proposed to improve retrieval performance. The data selection policy of search engine cache management plays an important role, which carefully places the data in memory or other storage, such as solid state disks (SSDs). Considering that the historical query log has a guiding role for the future query, we present an Efficient Data Selection (EDS) policy for search engine cache management, which views cache media as a knapsack, and views results and posting lists as items. The best benefit of EDS can be computed by greedy algorithms. We carry out a series of experiments to study the essential factors of the data selection in different architectures, including hard disk drive (HDD), SSD, and SSD-based hybrid storage architectures. The hybrid storage architecture is a two-level cache architecture, which uses SSD as a secondary cache for the memory. Our main goal is to improve the performance of the search engines and reduce the cost of the servers on two-level cache architecture. The experimental results demonstrate that our proposed policy improves the hit ratio by 20.04% as well as the retrieval performance on HDD, SSD, and hybrid architecture by 31.98%, 28.72% and 23.24%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call