Information Extraction from Hypertext Mark-Up Language Web Pages

Aida Mustapha,Hamidah Ibrahim,Lili Nurliyana Abdullah,Mahmoud Shaker

doi:10.3844/jcssp.2009.596.607

Abstract

Problems statement: Nowadays, many users use web search engines to find and gather information. User faces an increasing amount of various HTML information sources. The issue of correlating, integrating and presenting related information to users becomes important. When a user uses a search engine such as Yahoo and Google to seek specific information, the results are not only information about the availability of the desired information, but also information about other pages on which the desired information is mentioned. The number of selected pages is enormous. Therefore, the performance capabilities, the overlap among results for the same queries and limitations of web search engines are an important and large area of research. Extracting information from the web pages also becomes very important because the massive and increasing amount of diverse HTML information sources in the internet that are available to users and the variety of web pages making the process of information extraction from web a challenging problem. Approach: This study proposed an approach for extracting information from HTML web pages which was able to extract relevant information from different web pages based on standard classifications. Results: Proposed approach was evaluated by conducting experiments on a number of web pages from different domains and achieved increment in precision and F measure as well as decrement in recall. Conclusion: Experiments demonstrated that our approach extracted the attributes besides the sub attributes that described the extracted attributes and values of the sub attributes from various web pages. Proposed approach was able to extract the attributes that appear in different names in some of the web pages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Information Extraction from Hypertext Mark-Up Language Web Pages

Abstract

Talk to us

Similar Papers

More From: Journal of Computer Science

Lead the way for us

Journal: Journal of Computer Science	Publication Date: Aug 1, 2009
Citations: 17

Similar Papers

Information extraction from web tables
Mahmoud Shaker ... Aida Mustapha
-
Mahmoud Shaker, et. al.Mahmoud Shaker ... Aida Mustapha
14 Dec 2009
14 Dec 2009

Internet Search Engines
Vijay Kasi ... Radhika Jain
-
Vijay Kasi, et. al.Vijay Kasi ... Radhika Jain
01 Jan 2006
01 Jan 2006

Internet Search Engines
Vijay Kasi ... Radhika Jain
-
Vijay Kasi, et. al.Vijay Kasi ... Radhika Jain
18 Jan 2011
18 Jan 2011

Introduction to Webometrics: Quantitative Web Research for the Social Sciences
Michael Thelwall
Synthesis Lectures on Information Concepts, Retrieval, and Services | VOL. 1
Michael ThelwallMichael Thelwall
01 Jan 2009
Synthesis Lectures on Information Concepts, Retrieval, and Services | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Information Extraction from Hypertext Mark-Up Language Web Pages

Abstract

Talk to us

Similar Papers

More From: Journal of Computer Science