Structured Web Pages Research Articles

Thanks to the rapid expansion of the Internet, anyone can now access a vast array of information online. However, as the volume of web content continues to grow exponentially, search engines face challenges in delivering relevant results. Early search engines primarily relied on the words or phrases found within web pages to index and rank them. While this approach had its merits, it often resulted in irrelevant or inaccurate results. To address this issue, more advanced search engines began incorporating the hyperlink structures of web pages to help determine their relevance. While this method improved retrieval accuracy to some extent, it still had limitations, as it did not consider the actual content of web pages. The objective of the work is to enhance Web Information Retrieval methods by leveraging three key components: text content analysis, link analysis, and log file analysis. By integrating insights from these multiple data sources, the goal is to achieve a more accurate and effective ranking of relevant web pages in the retrieved document set, ultimately enhancing the user experience and delivering more precise search results the proposed system was tested with both multi-word and single-word queries, and the results were evaluated using metrics such as relative recall, precision, and F-measure. When compared to Google’s PageRank algorithm, the proposed system demonstrated superior performance, achieving an 81% mean average precision, 56% average relative recall, and a 66% F-measure.

Read full abstract

A string with many repetitions can be represented compactly by replacing h-fold contiguous repetitions of a string r with (r)h. We present a compact representation, which we call a repetition representation (of a string) or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a minimum RRS or MRRS, where the size of an RRS is defined by the sum of the length of component letters and the description length of the component repetitions (⋅)h which is defined by wR(h) using a repetition weight function wR. We develop two dynamic programming-based algorithms to solve this problem: CMR, which works for any type of wR, and CMR-C, which is faster but can be applied to a constant wR only. CMR-C is an O(n2logn)-time O(nlogn)-space algorithm, which is more efficient in both time and space than CMR by a ((logn)/n)-factor, where n is the length of the given string. The problem of finding an MRRS for a string can be extended to that of finding a minimum repetition representation (of a tree) or MRRT for a given labeled ordered tree. For this problem, we present two algorithms, CMRT and CMRT-C, by using CMR and CMR-C, respectively, as a subroutine. As well as the theoretical analysis, we confirmed the efficiency of the proposed algorithms by experiments, which consist of the following three parts: First we demonstrated that CMR-C and CMRT-C are fast enough for large-scale data by using synthetic strings and trees, respectively. The size of an MRRS for a given string can be a measure of how compactly the string can be represented, meaning how well the string is structurally organized. This is also true of trees. To check such ability of MRRS-size, second we measured the size of an MRRS for chromosomes of nine different species. We found that all the chromosomes of the same species have a similar compression rate when realized by an MRRS. Run length encoding (RLE) was also shown to have species-specific compression rate, but species were separated more clearly by MRRS than by RLE. Third we examined the size of an MRRT for web pages of world-leading companies by using the tag trees, showing a consistency between the compression rate by an MRRT and visual web page structures.

Read full abstract

Structured Web Pages Research Articles

Related Topics

Articles published on Structured Web Pages

Web Page Ranking Based on Text Content and Link Information Using Data Mining Techniques

A Framework for Automated Scraping of Structured Data Records From the Deep Web Using Semantic Labeling

Automated Discovery of Network Cameras in Heterogeneous Web Pages

Generating and selecting resilient and maintainable locators for Web automated testing

Layout Transposition for Non-Visual Navigation of Web Pages by Tactile Feedback on Mobile Devices.

A Method for Filtering Pages by Similarity Degree based on Dynamic Programming

Exploiting Multi-Category Characteristics and Unified Framework to Extract Web Content

Automated scraping of structured data records from health discussion forums using semantic analysis

Reordering Webpage Objects for Optimizing Quality-of-Experience

Hviz: HTTP(S) traffic aggregation and visualization for network forensics

Fast algorithms for finding a minimum repetition representation of strings and trees

Post-processing of Deep Web Information Extraction Based on Domain Ontology

HWPDE: Novel Approach for Data Extraction from Structured Web Pages

Manufacturing Deep Web Service Management: Exploring Semantic Web Technologies

Performance Evaluation of Java Web Services: A Developer’s Perspective

Closing the Loop in Webpage Understanding

Automatically Integrating Heterogeneous Ontologies from Structured Web Pages

Assessing metacognitive knowledge in web-based CALL: a neural network approach

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Structured Web Pages Research Articles

Related Topics

Articles published on Structured Web Pages

Web Page Ranking Based on Text Content and Link Information Using Data Mining Techniques

A Framework for Automated Scraping of Structured Data Records From the Deep Web Using Semantic Labeling

Automated Discovery of Network Cameras in Heterogeneous Web Pages

Generating and selecting resilient and maintainable locators for Web automated testing

Layout Transposition for Non-Visual Navigation of Web Pages by Tactile Feedback on Mobile Devices.

A Method for Filtering Pages by Similarity Degree based on Dynamic Programming

Exploiting Multi-Category Characteristics and Unified Framework to Extract Web Content

Automated scraping of structured data records from health discussion forums using semantic analysis

Reordering Webpage Objects for Optimizing Quality-of-Experience

Hviz: HTTP(S) traffic aggregation and visualization for network forensics

Fast algorithms for finding a minimum repetition representation of strings and trees

Post-processing of Deep Web Information Extraction Based on Domain Ontology

HWPDE: Novel Approach for Data Extraction from Structured Web Pages

Manufacturing Deep Web Service Management: Exploring Semantic Web Technologies

Performance Evaluation of Java Web Services: A Developer’s Perspective

Closing the Loop in Webpage Understanding

Automatically Integrating Heterogeneous Ontologies from Structured Web Pages

Assessing metacognitive knowledge in web-based CALL: a neural network approach