Webcrawling for a Biological Strategy Corpus to Support Biologically-Inspired Design

D Vandevenne,J Caicedo,S Dewulf,J R Duflou,P.-A Verhaegen

doi:10.1007/978-1-4471-4507-3_9

Abstract

In the context of a larger effort to develop a tool that supports ideation in the early stage of Biologically-Inspired Design, this paper describes how the first important research question is tackled: any scalable approach towards such a tool requires a large corpus of biological strategies. This corpus should contain as much of the world’s knowledge about how organisms tackle problems as possible and it should be updated in an automated way. However, currently such a resource or system does not exist. This paper presents a scalable webcrawling approach that allows to continuously search the Internet for biological strategies and to keep its knowledge base up-to-date without manual interaction. The webcrawler solves this needle-in-a-haystack task by combining different classifiers to score the relevance of web documents to the envisaged corpus. It uses these scores to focus future crawling and to gain efficiency. In this way, it becomes possible to continuously harvest new biological strategy documents in a scalable way. Finally, the possible applications of this contribution are positioned in the different existing approaches for systematic BID.

Full Text