Abstract

Latent Semantic Indexing (LSI) is a latent structural model, aiming to quickly and accurately analyze a large number of texts through statistical calculation methods, and then extract the potential semantic connections between terms, while highlighting the key meanings in the text and weakening the bad influence of polysemy of the words. LSI can simplify the text vector and reduce the dimensionality, with high recall and retrieval speed. This article uses examples of spam filtering to introduce in detail the theoretical basis of latent semantic indexing, that is, singular value decomposition and the construction of multi-dimensional conceptual spaces. And the important link-weight calculation TF-IDF method uses "Sigmoid function" and "location factor" to optimize, which can further emphasize the importance of different words in the text, and is also more conducive to the construction of latent semantic structure space. Then, the paper briefly introduces two applications: research on job description clustering and construction of patent information classification system using LSI. In the end, we elaborated on the of two latent semantic indexes: retrieval and search, parallel examples: research on job description clustering and construction of patent information classification system, and gave a brief introduction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call