Abstract

Semantic text similarity (STS) uses specific test collections as its performance evaluation measurement. The test collections consist of text pairs with the same meaning even though in different text form. The existence is scarce compared with information retrieval (IR) test collections. This paper investigates the possibility to reuse IR test collections for STS tasks. Text pairs are derived from the relevant pair of IR test collections. Latent semantic analysis (LSA) and explicit semantic analysis (ESA) evaluate Glasgow's test collections, which are provided by ACM SIGIR community. Jaccard index measures the lexical similarity. Recall metric measures retrievability of recycling test collection with two existing test collections, Microsoft research paraphrase corpus and Microsoft research video description corpus, as evaluation baselines. Evaluation yields a promising outcome; the evaluated test collections have low Jaccard index and their recall values between the two baselines.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.