Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph

Harish Sethu,Alexander Yates

doi:10.1109/socialcom.2010.105

Abstract

A representation of the World Wide Web as a directed graph, with vertices representing web pages and edges representing hypertext links, underpins the algorithms used by web search engines today. However, this representation involves a key oversimplification of the true complexity of the Web: an edge in the traditional Web graph represents only the existence of a hyperlink; information on the context (e.g., informational, adversarial, commercial, spam) behind the hyperlink is absent. In this work-in-progress paper, we describe an ongoing collaborative project between two teams, one specializing in network science and analysis and the other specializing in text analysis and machine learning, to address this oversimplification. Using techniques in natural language processing, text mining and machine learning to extract relevant features of hyperlinks and classify them into one of several types, this undertaking builds and analyzes a multi-relational web graph. A key aspect of this work is that the multi-relational graph emerges naturally from the data instead of being based on an imposed classification of the hyperlinks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Towards applying text mining and natural language processing for biomedical ontology acquisition
Tasha R Inniss ... Marc Light
-
Tasha R Inniss, et. al.Tasha R Inniss ... Marc Light
10 Nov 2006
10 Nov 2006

From Natural Language Text to Visual Models: A survey of Issues and Approaches
Cristina-Claudia OSMAN ... Paula-Georgiana ZALHAN
Informatica Economica | VOL. 20
Cristina-Claudia OSMAN, et. al.Cristina-Claudia OSMAN ... Paula-Georgiana ZALHAN
30 Dec 2016
Informatica Economica | VOL. 20

Mining Publication Papers via Text Mining: A Case Study
Ahmed S Ibrahim ... Mostafa Aref
-
Ahmed S Ibrahim, et. al.Ahmed S Ibrahim ... Mostafa Aref
01 Jan 2020
01 Jan 2020

Automated Identification of Aspirin-Exacerbated Respiratory Disease Using Natural Language Processing and Machine Learning: Algorithm Development and Evaluation Study.
Thanai Pongdee ... Sungrim Moon
JMIR AI | VOL. 2
Thanai Pongdee, et. al.Thanai Pongdee ... Sungrim Moon
12 Jun 2023
JMIR AI | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph

Abstract

Talk to us

Similar Papers