Abstract

Text mining is the process to deriving useful information from unstructured text data. During this process, text mining uses statistical and mathematical methods. Major text mining tasks include text categorization, text clustering, concept extraction, document summarization, semantic similarity and author identification. In this study, semantic similarity issues have been examined. Semantic similarity analysis aims to determine semantic similarity between texts. Probabilistic latent semantic analysis and latent Dirichlet allocation are probabilistic methods to determine semantic similarity between texts. In this study, semantic analysis using probabilistic latent semantic analysis and latent Dirichlet allocation methods is examined. Also, an application which is conducted to analyze semantic similarity and classify Turkish textual data chosen from different news agencies is discussed. R statistical programming language and Matlab are used in the application.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call