Abstract

ObjectivesMeasurement of similarities between documents is typically influenced by the sparseness of the term-document matrix employed. Latent semantic indexing (LSI) may improve the results of this type of analysis.MethodsIn this study, LSI was utilized in an attempt to reduce the term vector space of clinical documents and newspaper editorials.ResultsAfter applying LSI, document similarities were revealed more clearly in clinical documents than editorials. Clinical documents which can be characterized with co-occurring medical terms, various expressions for the same concepts, abbreviations, and typographical errors showed increased improvement with regards to a correlation between co-occurring terms and document similarities.ConclusionsOur results showed that LSI can be used effectively to measure similarities in clinical documents. In addition, correlation between the co-occurrence of terms and similarities realized in this study is an important positive feature associated with LSI.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call