Abstract
Quantitative molecular similarity analysis (QMSA) is a seemingly useful tool for estimating environmental properties for the hundreds of emerging contaminants that have not yet been fully evaluated. Moreover, calibrated QMSA models are also useful for prioritizing research among currently unmeasured chemicals of interest. Previous work has demonstrated that prioritization based on molecular ‘representativeness’, as parameterized using summed Euclidean distances in n dimensions corresponding to n molecular descriptors, improves the prediction accuracy of QMSA models compared to random selection of compounds to be measured. In this study, we use two datasets of environmental parameters (i.e. in vitro oestrogenicity and sorption distribution coefficient Kd ) to demonstrate that maximizing representativeness alone cannot deliver optimal improvement in prediction accuracy if many of the chemicals that have already been measured are themselves highly representative. Thus, proper QMSA-based prioritization among unmeasured chemicals constitutes a balance between maximizing representativeness and minimizing redundancy. It is demonstrated that redundancy considerations are especially critical for highly heterogeneous datasets, and some discussion about achieving a proper balance between the two prioritization criteria is presented.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.