Abstract

The ever-increasing rate of drug discovery data has complicated data analysis and potentially compromised data quality due to factors such as data handling errors. Parallel to this concern is the rise in blatant scientific misconduct. Combined, these problems highlight the importance of developing a method that can be used to systematically assess data quality. Benford's law has been used to discover data manipulation and data fabrication in various fields. In the authors' previous studies, it was demonstrated that the distribution of the corresponding activity and solubility data followed Benford's law distribution. It was also shown that too intense a selection of training data sets of regression model can disrupt Benford's law. Here, the authors present the application of Benford's law to a wider range of drug discovery data such as microarray and sequence data. They also suggest that Benford's law could also be applied to model building and reliability for structure-activity relationship study. Finally, the authors propose a protocol based on Benford's law which will provide researchers with an efficient method for data quality assessment. However, multifaceted quality control such as combinatorial use with data visualization may also be needed to further improve its reliability.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.