Abstract

In this paper, we improve the trawling and point out some communities missed by trawling. We use the DBG (Dense Bipartite Graph) to identify a structure of a potential community instead of CBG (Complete Bipartite Graph). Based on DBG, we proposed a new method based on edge removal to extract cores from a web graph. Moreover, we improve the crawler to save only potential pages as fans of a core and save a lot of disk storage space. To evaluate the set of cores whether or not belong to a community, the statistics of term frequency is used. In the paper, the dataset of experiment were crawled under domain “.cn”. The result show that the our algorithm works properly and some new cores can be found by our method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.