Abstract

Recently, a new form of structured summary on scientific papers is explored by grouping cited text spans from the reference paper. Its primary goal is to generate summaries based on the cited paper itself. Previously, traditional scientific summarization focused on citation-based methods by aggregating all citances that cite one unique paper without doing content-based citation analysis, while sometimes citations might differ between researchers or time slots. By investigating original text spans where scholars cited, the new method can reflect exact contributions of reference papers more. Therefore, how to identify cited text spans accurately becomes the first important problem to solve. Generally, it can be converted into finding the sentences in reference paper that is more similar with citation sentences. Taking it as a classification task, we investigate the potential of four actions to improve identification performance. Firstly, feature selections are conducted carefully according to multi-classifiers. Secondly, we apply sampling-based algorithms to preprocess class-imbalanced datasets. Since we integrated results via a weighted voting system, the third action is tuning parameters like, voting weights for multi-classifiers integration or running settings to see if we can improve performance further. Evaluation results show effectiveness of each action and demonstrate that researchers can take these actions for more accurate cited text spans identification when doing scientific summarization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.