The paper aims to solve huge amounts and noise of current internet information. In order to better excavate the user’s interests and provide accurate recommendations, improve the Internet environment and services and accurately analyze and evaluate the behavior of large-scale user groups. This article compares web pages with sentences in natural language, and uses the natural language processing algorithm Word2vec to extract the feature vectors of massive web pages browsed by users. PCA has been reduced to two dimensions, and webpage visualization and logistic regression have been implemented to classify webpages. A series of user clicks on webpages can be viewed as a walk in n-dimensional space. Probabilistic language models and Monte Carlo simulations are used as the next step for users. Click behavior to make predictions. Experiments show that the machine learning processing natural language algorithm used in this paper has a good performance in classification and prediction accuracy.