Abstract

Information search is a lucrative activity for electronic commerce, contributing to significant revenues. As search engine becomes the primary source for seeking health information, the search quality concerns consumers. Specifically, the health vocabulary used by general consumers differs from the professionals, leading to unsatisfied retrieval performance and misleading information. The situation is increasingly complex when consumers search health information across languages. To study consumer health search (CHS) across languages, we propose a cross-lingual retrieval framework comprising semantic space construction and information retrieval modules. The semantic space construction module adopts a weakly-supervised approach to determine a cross-lingual word space (CLWS) from collected consumer-generated health content, which helps capture consumers’ vocabulary and identify medical expression translations. The information retrieval module suggests strategies that utilize the translations of CLWS for the subsequent retrieval tasks. By evaluating the performance on two health information search engines, we found that Google Translate (GT), a widely-adopted translation service, is prone to generate mistranslations to the colloquial medical expressions. The CLWS helps identify GT’s mistranslations and filter out the results retrieved by those. More importantly, our framework demonstrates a strategy to integrate translations of CLWS and GT to reach the best retrieval performance for the cross-lingual CHS task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call