Abstract

OK, maybe the title is a little cheeky, but it does accurately and humorously convey a valuable learning experience that I recently had: a large dataset is absolutely critical for statistically significant results with tight confidence intervals. In this case, bigger really is better! Of course, more accurate data is better too. The Community Structure-Activity Resource (CSAR)1 periodically holds exercises to allow scientists to test their docking and scoring methods. This issue of the Journal of Chemical Information and Modeling presents the papers that resulted from our most recent exercise, one based on blinded data. CSAR was very fortunate to receive large datasets of unpublished protein-ligand binding data from Abbott and Vertex. My concern was that there was too much data to use in an exercise; surely, it would take participants too long to accurately calculate all the possibilities. To make the exercise tractable in a limited period of time, I decided that we should use a smaller subset of data for the exercise and release the full set after it concluded. Unfortunately, reducing the size of the dataset made the error estimates very large, and it was very difficult to compare the results. To quote one of the participants, “Thanks, but no thanks!” To address this issue, we asked that participants submit papers to this issue of the Journal of Chemical Information and Modeling that present both their initial, blinded results and results based on the full dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.