Abstract
Scientists face the challenge of having to navigate the deluge of information contained in the articles published in their domain of research. Tools such as citation indexes link papers but do not indicate the passage in the paper that is being cited. In this study, we report our early attempts to design a framework for finding sentences that are cited in a given article, a task we have called citation linkage. We first discuss our building of a corpus annotated by domain experts. Then, with datasets consisting of all possible citing sentence-candidate sentence pairs, some deemed not to be cited and others deemed to be by the annotators with confidence ratings 1 to 5 (lowest to highest), we have built regression models whose outputs are used to predict the degree of similarity for any pair of sentences in a target paper. Even though the Pearson correlation coefficient between the predicted values and the expected values is low (0.2759 with a linear regression model), we have shown that the citation linkage goal can be achieved. When we use the learning models to rank the predicted scores for sentences in a target article, 18 papers out of 22 have at least one sentence ranked in the top k positions (k being the number of relevant sentences per paper) and 10 papers (45%) have their Normalized Discounted Cumulative Gain (NDCG) scores greater than 71% and Precision greater than 44%. The mean average NDCG is 47% and the Mean Average Precision is 29% over all the papers.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.