Concept-based knowledge discovery in texts extracted from the Web

Stanley Loh,José Palazzo M De Oliveira,Leandro Krug Wives

doi:10.1145/360402.360414

Abstract

ABSTRACT This paper presents an approach for knowledge discovery in texts extracted from the Web. Instead of analyzing words or attribute values, the approach is based on concepts, which are extracted from texts to be used as characteristics in the mining process. Statistical techniques are applied on concepts in order to find interesting patterns in concept distributions or associations. In this way, users can pe rform discovery in a high level, since concepts describe real world events, objects, thoughts, etc. For identifying concepts in texts, a categorization algorithm is used associated to a previous classification task for concept definitions. Two experiments are presented: one for political analysis and other for competitive intelligence. At the end, the approach is discussed, examining its problems and advantages in the Web context. Keywords Knowledge discovery, data mining, information extraction, categorization, text mining. 1. INTRODUCTION The Web is a large and growing collection of texts. This amount of text is becoming a valuable resource of information and knowledge. As Garofalakis and partners comment,

Full Text