Profile-Based Focused Crawling for Social Media-Sharing Websites

Zhiyong Zhang,Olfa Nasraoui

doi:10.1155/2009/856037

Abstract

We present a novel profile-based focused crawling system for dealing with the increasingly popular social media-sharing websites. In this system, we treat the user profiles as ranking criteria for guiding the crawling process. Furthermore, we divide a user's profile into two parts, an internal part, which comes from the user's own contribution, and an external part, which comes from the user's social contacts. In order to expand the crawling topic, a cotagging topic-discovery scheme was adopted for social media-sharing websites. In order to efficiently and effectively extract data for the focused crawling, a path string-based page classification method is first developed for identifying list pages, detail pages, and profile pages. The identification of the correct type of page is essential for our crawling, since we want to distinguish between list, profile, and detail pages in order to extract the correct information from each type of page, and subsequently estimate a reasonable ranking for each link that is encountered while crawling. Our experiments prove the robustness of our profile-based focused crawler, as well as a significant improvement in harvest ratio, compared to breadth-first and online page importance computation (OPIC) crawlers, when crawling the Flickr website for two different topics.

Highlights

Social media-sharing websites such as Flickr and YouTube are becoming more and more popular
We propose to use a Document Object Model (DOM) path string-based method for page classification
We assume that using the path string method, if we do not need to consider schema path strings, we save a lot of effort for extracting real data

Summary

Introduction

Social media-sharing websites such as Flickr and YouTube are becoming more and more popular. Little attention has been paid to effectively exploit the second type of information, which are the user profiles, in order to enhance focused search on social media websites. We exploit the users’ profile information from social media-sharing websites to develop a more accurate focused crawler that is expected to enhance the accuracy of multimedia search. To begin the focused crawling process, we first need to accurately identify the correct type of a page To this end, we propose to use a Document Object Model (DOM) path string-based method for page classification.

Related Work

Motivation for Profile-Based Focused Crawling

Path String-Based Page Classification

Page Classification Using Path String

Profile-Based Focused Crawler

Cotagging Topic Discovery

Profile-Based Focused Crawling System

Experimental Results

US UV INT

Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Image and Video Processing	Publication Date: Jan 1, 2009
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

Profile-Based Focused Crawling for Social Media-Sharing Websites

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Image and Video Processing

Lead the way for us

Similar Papers

Profile-Based Focused Crawler for Social Media-Sharing Websites
Zhiyong Zhang ... Olfa Nasraoui
-
Zhiyong Zhang, et. al.Zhiyong Zhang ... Olfa Nasraoui
01 Nov 2008
01 Nov 2008

Communication management on social networking sites
Christopher Hendrik Ruehl ... Diana Ingenhoff
Journal of Communication Management | VOL. 19
Christopher Hendrik Ruehl, et. al.Christopher Hendrik Ruehl ... Diana Ingenhoff
03 Aug 2015
Journal of Communication Management | VOL. 19

How safe is your facebook profile? Privacy issues of online social networks
R Wallbridge
ANU Undergraduate Research Journal | VOL. 1
R WallbridgeR Wallbridge
01 Oct 2009
ANU Undergraduate Research Journal | VOL. 1

The Pervasiveness, Connectedness, and Intrusiveness of Social Network Site Use Among Young Adolescents
Guadalupe Espinoza ... Jaana Juvonen
Cyberpsychology, Behavior, and Social Networking | VOL. 14
Guadalupe Espinoza, et. al.Guadalupe Espinoza ... Jaana Juvonen
13 Jun 2011
Cyberpsychology, Behavior, and Social Networking | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Profile-Based Focused Crawling for Social Media-Sharing Websites

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Image and Video Processing