Abstract

This chapter discusses that when the self-organizing map (SOM) is applied to the mapping of documents, one can represent them statistically by their weighted word frequency histograms or some reduced representations of the histograms that can be regarded as data vectors. One SOM of about seven million documents has been made, viz., of all of the patent abstracts in the world that have been written in English and are available in electronic form. The map consists of about one million models. Keywords or key texts can be used to search for the most relevant documents first. New effective coding and computational schemes of the mapping are described. The document organization, searching, and browsing system is called WEBSOM, and is described in this chapter. The original WEBSOM was two-level SOM architecture, but it was later simplified.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call