Abstract
BackgroundDue to the nature of scientific methodology, research articles are rich in speculative and tentative statements, also known as hedges. We explore a linguistically motivated approach to the problem of recognizing such language in biomedical research articles. Our approach draws on prior linguistic work as well as existing lexical resources to create a dictionary of hedging cues and extends it by introducing syntactic patterns.Furthermore, recognizing that hedging cues differ in speculative strength, we assign them weights in two ways: automatically using the information gain (IG) measure and semi-automatically based on their types and centrality to hedging. Weights of hedging cues are used to determine the speculative strength of sentences.ResultsWe test our system on two publicly available hedging datasets. On the fruit-fly dataset, we achieve a precision-recall breakeven point (BEP) of 0.85 using the semi-automatic weighting scheme and a lower BEP of 0.80 with the information gain weighting scheme. These results are competitive with the previously reported best results (BEP of 0.85). On the BMC dataset, using semi-automatic weighting yields a BEP of 0.82, a statistically significant improvement (p <0.01) over the previously reported best result (BEP of 0.76), while information gain weighting yields a BEP of 0.70.ConclusionOur results demonstrate that speculative language can be recognized successfully with a linguistically motivated approach and confirms that selection of hedging devices affects the speculative strength of the sentence, which can be captured reasonably by weighting the hedging cues. The improvement obtained on the BMC dataset with a semi-automatic weighting scheme indicates that our linguistically oriented approach is more portable than the machine-learning based approaches. Lower performance obtained with the information gain weighting scheme suggests that this method may benefit from a larger, manually annotated corpus for automatically inducing the weights.
Highlights
Due to the nature of scientific methodology, research articles are rich in speculative and tentative statements, known as hedges
Our results demonstrate that speculative language can be recognized successfully with a linguistically motivated approach and confirms that selection of hedging devices affects the speculative strength of the sentence, which can be captured reasonably by weighting the hedging cues
The improvement obtained on the BMC dataset with a semi-automatic weighting scheme indicates that our linguistically oriented approach is more portable than the machine-learning based approaches
Summary
Due to the nature of scientific methodology, research articles are rich in speculative and tentative statements, known as hedges. Scientific writing, in biomedical research articles, reflects this, as it is rich in speculative statements (hedges). (b) The lack of Cut expression in wild-type ventral cells abutting the D-V boundary indicates that D-mib is required for Ser signaling by dorsal cells and acts in a nonautonomous manner to activate N in ventral cells. These examples illustrate the phenomenon of hedging in the biomedical literature, they highlight difficulties in recognizing hedges. Hedging in the second sentence seems to be further marked by the subject of indicate, The lack of Cut expression in wild-type ventral cells abutting the D-V boundary
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.