Abstract

In view of the exponential growth of online document corpora, even perfect retrieval will fetch too much material for a user to cope with. One way to reduce this problem is automatic domain-specific summarization tailored to user's needs, which is a kind of high-level data cleaning. This requires some method of discovering classes of similar items that may be grouped into predetermined domains. We explore whether there exists a synergic relation between systems for classification and those for summarization by way of composing those subsystems. In other words, we examine whether prior summarization will increase the performance of the classifier system and vice versa. In both cases, the answer is affirmative, as we show in this paper. We propose a text-mining framework in which these subsystems are treated as constituents of a knowledge discovery process for text corpora.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call