Keyword Extraction and Headline Generation Using Novel Word Features

Songhua Xu,Francis Lau,Shaohui Yang

doi:10.1609/aaai.v24i1.7511

Abstract

We introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generate a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features offer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feature to charcterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Keyword Extraction and Headline Generation Using Novel Word Features

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jul 5, 2010
Citations: 42

Similar Papers

Frequency in Incidental Vocabulary Acquisition Research: An Undefined Concept and Some Consequences
Barry Lee Reynolds ... David Wible
TESOL Quarterly | VOL. 48
Barry Lee Reynolds, et. al.Barry Lee Reynolds ... David Wible
28 Oct 2014
TESOL Quarterly | VOL. 48

Keyword extraction using backpropagation neural networks and rule extraction
Arnulfo Azcarraga ... Rudy Setiono
-
Arnulfo Azcarraga, et. al.Arnulfo Azcarraga ... Rudy Setiono
01 Jun 2012
01 Jun 2012

Assorted Attention Network for Cross-Lingual Language-to-Vision Retrieval
Tan Yu ... Hongliang Fei
-
Tan Yu, et. al.Tan Yu ... Hongliang Fei
26 Oct 2021
26 Oct 2021

Keyword Extraction Approach Based on Probabilistic-Entropy, Graph, and Neural Network Methods
Anton A Selivanov ... Ivan A Moloshnikov
-
Anton A Selivanov, et. al.Anton A Selivanov ... Ivan A Moloshnikov
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Keyword Extraction and Headline Generation Using Novel Word Features

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence