Trawling the Web for emerging cyber-communities

Ravi Kumar,Prabhakar Raghavan,Sridhar Rajagopalan,Andrew Tomkins

doi:10.1016/s1389-1286(99)00040-7

Trawling the Web for emerging cyber-communities

Ravi Kumar, Prabhakar Raghavan + Show 2 more

https://doi.org/10.1016/s1389-1286(99)00040-7

Copy DOI

Journal: Computer Networks	Publication Date: May 1, 1999
Citations: 957

Affiliation: IBM Research - Almaden

#Huge Data Set #Web Crawl + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

The Web harbors a large number of communities — groups of content-creators sharing a common interest — each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities — those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Computer Networks

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.