Abstract

SummaryWe present EcForest, an extractive summarization model through Enhanced Sentence Embedding and Cascade Forest. Sentence representation is of great significance for many summarization methods. Bag‐of‐words mostly fails to grasp the semantics, and typical embedding models cannot capture more complex semantic features, such as polysemy and the meaning of a phrase, which is usually ignored by simply averaging the word embeddings included in a sentence. To this end, we propose Enhanced Sentence Embedding (ESE) model to solve such drawbacks via mapping several valid features to dense vectors. Essentially, the enhanced sentence embedding is a novel model for improving the distributed representation of sentence. Our sentence embedding model is universally applicable and it can be adapted to other NLP tasks. Moreover, deep forest is used as a sentence extraction algorithm for its robustness to the hyper‐parameters and its efficient training algorithm compared to deep neural network. The evaluation of variant models proposed in this work proves the validation of the enhanced sentence embedding. The comparison results between EcForest and several baselines on two different datasets demonstrate that the proposed summarization model performs better than or with high competitiveness to the state‐of‐the‐art.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call