Abstract

It is shown that unconscious associative processes taking place in the memory of a searcher during the formulation of a search query in information retrieval — such as the production of free word associations and the generation of synonyms — can be simulated using statistical models that analyze the distribution of words in large text corpora. The free word associations as produced by subjects on presentation of stimulus words can be predicted by applying first-order statistics to the frequencies of word co-occurrences as observed in texts. The generation of synonyms can also be conducted on co-occurrence data but requires second-order statistics. Both approaches are compared and validated on empirical data. It turns out that for both tasks the performance in the simulation is comparable to the performance of human subjects.KeywordsInformation RetrievalIndependent Component AnalysisSemantic SimilarityIndependent Component AnalysisStimulus WordThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call