Although uncertainties expressed in texts within QSAR studies can guide quantitative uncertainty estimations, they are often overlooked during uncertainty analysis. Using neurotoxicity as an example, this study developed a method to support analysis of implicitly and explicitly expressed uncertainties in QSAR modeling studies. Text content analysis was employed to identify implicit and explicit uncertainty indicators, whereafter uncertainties within the indicator-containing sentences were identified and systematically categorized according to 20 uncertainty sources. Our results show that implicit uncertainty was more frequent within most uncertainty sources (13/20), while explicit uncertainty was more frequent in only three sources, indicating that uncertainty is predominantly expressed implicitly in the field. The most highly cited sources included Mechanistic plausibility, Model relevance and Model performance, suggesting they constitute sources of most concern. The fact that other sources like Data balance were not mentioned, although it is recognized in the broader QSAR literature as an area of concern, demonstrates that the output from the type of analysis conducted here must be interpreted in the context of the broader QSAR literature before conclusions are drawn. Overall, the method established here can be applied in other QSAR modeling contexts and ultimately guide efforts targeted towards addressing the identified uncertainty sources.
Read full abstract