Requested Web Pages Research Articles

This paper proposes an advanced countermeasure against distributed web-crawlers. We investigated other methods for crawler detection and analyzed how distributed crawlers can bypass these methods. Our method can detect distributed crawlers by focusing on the property that web traffic follows the power distribution. When we sort web pages by the number of requests, most of requests are concentrated on the most frequently requested web pages. In addition, there will be some web pages that normal users do not generally request. But crawlers will request for these web pages because their algorithms are intended to request iteratively by parsing web pages to collect every item the crawlers encounter. Therefore, we can assume that if some IP addresses are frequently used to request the web pages that are located in the long-tail area of a power distribution graph, those IP addresses can be classified as crawler nodes. The experimental results with NASA web traffic data showed that our method was effective in identifying distributed crawlers with 0.0275% false positives when a conventional frequency-based detection method shows 2.882% false positives with an equal access threshold.

Read full abstract

The purpose of this study is to analyze and then model, using neural network models, the performance of the Web server in order to improve them. In our experiments, the parameters taken into account are the number of instances of clients simultaneously requesting the same Web page that contains the same SQL queries, the number of tables queried by the SQL, the number of records to be displayed on the requested Web pages, and the type of used database server. This work demonstrates the influences of these parameters on the results of Web server performance analyzes. For the MySQL database server, it has been observed that the mean response time of the Web server tends to become increasingly slow as the number of client connection occurrences as well as the number of records to display increases. For the PostgreSQL database server, the mean response time of the Web server does not change much, although there is an increase in the number of clients and/or size of information to be displayed on Web pages. Although it has been observed that the mean response time of the Web server is generally a little faster for the MySQL database server, it has been noted that this mean response time of the Web server is more stable for PostgreSQL database server.

Read full abstract

Requested Web Pages Research Articles

Related Topics

Articles published on Requested Web Pages

Detection Method for Distributed Web-Crawlers: A Long-Tail Threshold Model

Analysis and Neural Networks Modeling of Web Server Performances Using MySQL and PostgreSQL

Enhancing web accessibility by implementing context aware proxy

An Heighten PSO-K-harmonic Mean Based Pattern Recognition in User Navigation

An Execution-flow Based Method for Detecting Cross-Site Scripting of Ajax Applications

Analysis of a Cyclic Multicast Proxy Server Architecture

Proposte per facilitare accesso e scambio via internet delle informazioni dei piani di assestamento forestale

Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Digestor: device-independent access to the World Wide Web

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Requested Web Pages Research Articles

Related Topics

Articles published on Requested Web Pages

Detection Method for Distributed Web-Crawlers: A Long-Tail Threshold Model

Analysis and Neural Networks Modeling of Web Server Performances Using MySQL and PostgreSQL

Enhancing web accessibility by implementing context aware proxy

An Heighten PSO-K-harmonic Mean Based Pattern Recognition in User Navigation

An Execution-flow Based Method for Detecting Cross-Site Scripting of Ajax Applications

Analysis of a Cyclic Multicast Proxy Server Architecture

Proposte per facilitare accesso e scambio via internet delle informazioni dei piani di assestamento forestale

Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Digestor: device-independent access to the World Wide Web