Abstract
The main function of information retrieval (IR) system is to obtain efficient and exactly a minimum subset of document that is related to user concern. Synonymy and polysemy act as a barrier for natural language processing algorithms due to overestimation and misrepresentation. The proposed model uses the implicit of higher rank structure in combing terms with document to optimize the identification of relevant document based on terms used in queries with an enhanced automatic indexing approach has been suggested. The study benefited from the use of Term Frequency Inverse Document Frequency (TF-IDF) method to assign weight for each term in the document. Each document is presented as a vector of weight in the space. Also, the user query is represented as vector of weight. Finally, a Singular Value Decomposition (SVD) approach has been used in which a huge weight of term-document matrix is factorized into collection of vectors for approximation of the original matrix. The cosine similarity is also used to determine the closed vector of document to the user query. In regard to English information retrieval, It was observed that TF-IDF showed higher performance before term percentage 0.3 while Latent Semantic Indexing (LSI) was more stable than TF-IDF, especially in terms of the use of word association.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have