The previous work in web based applications such as mining web content, pattern recognition and similarity measures between the web documents. This paper is about, analyzing web documents in an enhanced way and delve the distillation web document will be the next pace in hypertext mining. The sparse document is a very little data on the web, which may face problems like different words with almost identical or similar meanings and sparseness. Natural language processing (NLP) and information retrieval (IR) are the main obstacles of the above problem. The mining of hidden terms discovers the search queries from large external datasets (universal datasets). It helps to handle unseen data in a better way. The goal of this web document mining consists of an efficient information finding, filtering information based on user query, and discovers more topic focused keywords based on the rich source of global information datasets. The proposed method we use the Distillation model, it is the integration of probabilistic generative model, Gibbs sampling algorithm and deployment method. This model can be applied for different natural languages and data domains for achieving the goal.