Set Of Index Terms Research Articles

The research on automatic hypertext construction emerges rapidly in the last decade because there exists a urgent need to translate the gigantic amount of legacy documents into web pages. Unlike traditional 'flat' texts, a hypertext contains a number of navigational hyperlinks that point to some related hypertexts or locations of the same hypertext. Traditionally, these hyperlinks were constructed by the creators of the web pages with or without the help of some authoring tools. However, the gigantic amount of documents produced each day prevent from such manual construction. Thus an automatic hypertext construction method is necessary for content providers to efficiently produce adequate information that can be used by web surfers. Although most of the web pages contain a number of non-textual data such as images, sounds, and video clips, text data still contribute the major part of information about the pages. Therefore, it is not surprising that most of automatic hypertext construction methods inherit from traditional information retrieval research. In this work, we will propose a new automatic hypertext construction method based on a text mining approach. Our method applies the self-organizing map algorithm to cluster some at text documents in a training corpus and generate two maps. We then use these maps to identify the sources and destinations of some important hyperlinks within these training documents. The constructed hyperlinks are then inserted into the training documents to translate them into hypertext form. Such translated documents will form the new corpus. Incoming documents can also be translated into hypertext form and added to the corpus through the same approach. Our method had been tested on a set of at text documents collected from a newswire site. Although we only use Chinese text documents, our approach can be applied to any documents that can be transformed to a set of index terms.

Read full abstract

An important stage in the process of retrieval of objects from a hypertext database is the creation of a set of internodal links that are intended to represent the relationships existing between objects; this operation is often undertaken manually, just as index terms are often manually assigned to documents in a conventional retrieval system. In an earlier article (Ellis, D., Furner-Hines, J., & Willett, P., 1994b), the results were published of a study in which several different sets of links were inserted, each by a different person, between the paragraphs of each of a number of full-text documents. These results showed little similarity between the link-sets, a finding that was comparable with those of studies of inter-indexer consistency, which suggest that there is generally only a low level of agreement between the sets of index terms assigned to a document by different indexers. In this article, a description is provided of an investigation into the nature of the relationship existing between (i) the levels of inter-linker consistency obtaining among the group of hypertext databases used in our earlier experiments, and (ii) the levels of effectiveness of a number of searches carried out in those databases. An account is given of the implementation of the searches and of the methods used in the calculation of numerical values expressing their effectiveness. Analysis of the results of a comparison between recorded levels of consistency and those of effectiveness does not allow us to draw conclusions about the consistency-effectiveness relationship that are equivalent to those drawn in comparable studies of inter-indexer consistency. © 1996 John Wiley & Sons, Inc.

Read full abstract

Set Of Index Terms Research Articles

Related Topics

Articles published on Set Of Index Terms

Set-based vector model

A text mining approach for automatic construction of hypertexts

Navigation via similarity: Automatic linking based on semantic closeness

On the creation of hypertext links in full‐text documents: Measurement of retrieval effectiveness

All in the mind: concept analysis in indexing

ON THE CREATION OF HYPERTEXT LINKS IN FULL‐TEXT DOCUMENTS: MEASUREMENT OF INTER‐LINKER CONSISTENCY

A probability distribution model for information retrieval

Trends in research on information retrieval — The potential for improvements in conventional Boolean retrieval systems

Probabilistic methods for ranking output documents in conventional Boolean retrieval systems

Towards a theory of document learning

Dynamic dictionary updating

Computer assisted indexing

Source Indexing of IEEE Publications

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Set Of Index Terms Research Articles

Related Topics

Articles published on Set Of Index Terms

Set-based vector model

A text mining approach for automatic construction of hypertexts

Navigation via similarity: Automatic linking based on semantic closeness

On the creation of hypertext links in full‐text documents: Measurement of retrieval effectiveness

All in the mind: concept analysis in indexing

ON THE CREATION OF HYPERTEXT LINKS IN FULL‐TEXT DOCUMENTS: MEASUREMENT OF INTER‐LINKER CONSISTENCY

A probability distribution model for information retrieval

Trends in research on information retrieval — The potential for improvements in conventional Boolean retrieval systems

Probabilistic methods for ranking output documents in conventional Boolean retrieval systems

Towards a theory of document learning

Dynamic dictionary updating

Computer assisted indexing

Source Indexing of IEEE Publications