Abstract
With the rapid development of Cloud-based services, the necessity of a Cloud service discovery engine becomes a fundamental requirement. A semantic focused crawler is one of the most key components of Cloud service discovery engines. However, the huge size and varied functionalities of Cloud services on the Web have a great effect on crawlers to provide effective Cloud services. It is a challenge for semantic crawlers to search only for URLs that offer Cloud services from this explosion of information. To solve these issues, this paper proposes a self-adaptive semantic focused crawler based on Latent Dirichlet Allocation (LDA) for efficient Cloud service discovery. In this paper, we present a Cloud Service Ontology (CSOnt) that defines Cloud service categories. CSOnt contains a set of concepts, allowing the crawler to automatically collect and categorize Cloud services. Moreover, our proposed crawler adopts URLs priority techniques to maintain the order of URLs to be parsed for efficient retrieval of the relevant Cloud services. Additionally, we create a self-adaptive semantic focused crawler, which has an ontology-learning function to automatically improve the proposed Cloud Service Ontology and maintain the crawler’s performance.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have