Recently, most companies have been conducting their business through the cloud network, so increasing the speed of data access via the Internet has become very important. Therefore, the importance of cache algorithms lies in increasing the speed of data access. As we know, there are cache substitution algorithms that work well with the cache when applied to the processor cache, but they hardly work when applied to caching for web purposes. The reason for this discrepancy is that it is not intended to increase the complexity of this type of storage. Considering the challenges of cache usage in terms of the large discrepancy in file size and the heterogeneous model of user access to data in a web environment, especially with pages with dynamic content, there is a real need to develop 'web cache' algorithms. In this paper, a hybrid algorithm for web caching using semantic similarity is developed. The GDFS algorithm was developed using NGD (Normalized Google Distance) to determine the semantic similarity between cache objects, which resulted in better performance in comparison to other algorithms. The results showed that the web cache hybrid algorithm using semantic similarity increased the hit rate compared to the basic algorithms by up to 80.10%. The proposed hybrid algorithm was able to overcome the problem of the low byte hit rate in the GDFS algorithm. The improvement in the byte hit rate reached 65%. This indicates an increase in the byte hit rate. The results showed that the web cache hybrid algorithm using semantic similarity reduced the page load time using distance measured from Google compared to the page load time using other algorithms that do not use semantic similarity and do not use cache. The results showed that the web cache hybrid algorithm using semantic similarity reduced the page load time from 4.17 seconds using GDFS-Line to 2.16 seconds using GDFS-NGD.
Read full abstract