Abstract

There are many duplicated web pages in the multimedia web resources, and elimination of the duplicates can remove the duplicated pages, reduce storage costs and improve search engine performance. Based on analysis of the classic algorithm of eliminating the duplicates, his article raises an improved algorithm for judging web page text repetition. The new algorithm runs the elimination process on the basis of webpage contents which are used as the vector characteristics in the comparison with the webpages' approximation, and analyzes how to capture the web page's theme. Hence, we can make a multidimensional improvement in the elimination of the duplicates of multimedia webpages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call