Abstract

We present completely automatic procedures for the self-generation of meaningful groups among the members of a document collection and for the classification according to these groups of subsequent documents. These procedures operate on large document collections with reasonably short computation times. Thus far, in our experiments on the physics literature, automatic classification has proven to be as good as or better than manual indexing and, in addition, potentially less expensive. Empirically derived, our method is based upon a pattern discovery algorithm which uses only the citation content of a document and which operates on the bibliographic links among papers. The self-generated groups correspond to very specific subject headings; retrospective bibliographies generated in the procedures allow one to classify the subsequent literature with remarkably high recall and relevance ratios, close to 100%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.