Abstract

Word Mover's Distance (WMD) is a document distance metric with free parameter, intelligible interpretation and unprecedented accuracy on document classification. WMD is on the basis of word embedding and largely focuses on semantic relationships rather than syntactic relationships, which would bring some limitations on measuring document distance. To enhance the impact of syntactic information, we proposed a new method called WMD with Part-of-Speech (PWMD) that integrates part-of-speech (POS) into the original WMD model. POS is a kind of syntactic information, providing more valuable features combined with WMD in document distance metric. Two combination strategies of the POS tagging are provided in “WMD, “word level” and “document level”. The results of contrastive experiments have shown that the PWMD is able to get better document distance than WMD.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call