Massive Web Pages Research Articles

With the development of mobile technology, the users browsing habits are gradually shifted from only information retrieval to active recommendation. The classification mapping algorithm between users interests and web contents has been become more and more difficult with the volume and variety of web pages. Some big news portal sites and social media companies hire more editors to label these new concepts and words, and use the computing servers with larger memory to deal with the massive document classification, based on traditional supervised or semi-supervised machine learning methods. This paper provides an optimized classification algorithm for massive web page classification using semantic networks, such as Wikipedia, WordNet. In this paper, we used Wikipedia data set and initialized a few category entity words as class words. A weight estimation algorithm based on the depth and breadth of Wikipedia network is used to calculate the class weight of all Wikipedia Entity Words. A kinship-relation association based on content similarity of entity was therefore suggested optimizing the unbalance problem when a category node inherited the probability from multiple fathers. The keywords in the web page are extracted from the title and the main text using N-gram with Wikipedia Entity Words, and Bayesian classifier is used to estimate the page class probability. Experimental results showed that the proposed method obtained good scalability, robustness and reliability for massive web pages.

As a consequence of rapid and immoderate urbanization, simulating urban growth in metropolitan areas effectively becomes a crucial and yet difficult task. Cellular automata (CA) model is an attractive tool for understanding complex geographical phenomena. Although intercity urban flows, the key factors in metropolitan development, have already been taken into consideration in CA models, there is still room for improvement because the influences of urban flows may not necessarily follow the distance decay relationship and may change over time. A feasible solution is to define the weights of intercity urban flows. Therefore, this study presents a novel method based on weighted urban flows (CAWeightedFlow) with the support of web search engine. The relatedness measured by the co-occurrences of the cities’ names (toponyms) on massive web pages can be deemed as the weights of intercity urban flows. After applying the weights, the gravitational field model is integrated with Logistic-CA to fulfill the modeling task. This method is employed to the urban growth simulation in the Pearl River Delta, one of the most urbanized metropolitan areas in China, from 2005 to 2008. The results indicate that our method outperforms traditional methods with respect to two measures of calibration goodness-of-fit. For example, CAWeightedFlow can yield the best value of ‘figure of merit’. Moreover, the proposed method can be further used to explore various development possibilities by simply changing the weights.

Massive Web Pages Research Articles

Related Topics

Articles published on Massive Web Pages

Visualization Classification and Prediction Based on Data Mining

Expert Information Automatic Extraction for IOT Knowledge Base

An optimized approach for massive web page classification using entity similarity based on semantic network

Survey on Different Ranking Algorithms Along With Their Approaches

Simulating urban growth in a metropolitan area based on weighted urban flows by using web search engine

Document Clustering Using Semantic Cliques Aggregation

Template-Based Delta Compression of Large Scale Web Pages

Web Entities Extraction Based on Semi-Structured Semantic Database

Analyzing Relatedness by Toponym Co‐Occurrences on Web Pages

A Length-variable Feature Code Based Fuzzy Duplicates Elimination Approach for Large Scale Chinese WebPages

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Massive Web Pages Research Articles

Related Topics

Articles published on Massive Web Pages

Visualization Classification and Prediction Based on Data Mining

Expert Information Automatic Extraction for IOT Knowledge Base

An optimized approach for massive web page classification using entity similarity based on semantic network

Survey on Different Ranking Algorithms Along With Their Approaches

Simulating urban growth in a metropolitan area based on weighted urban flows by using web search engine

Document Clustering Using Semantic Cliques Aggregation

Template-Based Delta Compression of Large Scale Web Pages

Web Entities Extraction Based on Semi-Structured Semantic Database

Analyzing Relatedness by Toponym Co‐Occurrences on Web Pages

A Length-variable Feature Code Based Fuzzy Duplicates Elimination Approach for Large Scale Chinese WebPages