Abstract
Real-time news documents published on the Internet have global financial and political impacts. Pioneering statistical approaches investigate manually defined features to capture lexical, sentiment, and event information, which suffer from feature sparsity. As a remedy, recent work has considered learning dense vector representations for documents. Such representations are general, which can not model target-dependent scenarios, such as stance detection towards a specific claim. There has been work on target-specific word and sentence representations, but little was done on target-dependent document representation. Moreover, documents contain more potentially helpful information, but also noise compared to events and sentences. To address the above issues, we focus on models that are: 1. task-driven , which optimize the neural network representations for the end task; 2. target-specific , learning news representations by considering the influence of specific targets. In particular, we propose a novel document-level target-dependent learning framework TEND. The framework employs the information of the target and the news abstract as clues, obtaining relatively informative sentences from the entire document for our objectives. The framework assembles a document representation by integrating the news abstract representation and a weighted sum of sentence representations in the document. To the best of our knowledge, we are among the first to investigate target-dependent document representation. Existing text representation models can be easily integrated into our TEND framework, and it is general enough to be applied to different target-dependent document representation tasks. We empirically evaluate our framework on two target-dependent document-level tasks, including a cumulative abnormal return prediction task and a news stance detection task. Results show that our models give the best performances compared to state-of-the-art document embedding methods, yielding robust and consistent performances across datasets.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.