Abstract

The dimensions of databases can be defined based on a variety of concepts, ranging from the standard tools of principal component analysis to context-biased approaches. The effective dimensions of databases, in particular the effective dimensions involving continua such as electron density data, provide a set of important tools for database comparisons and for the evaluation of some aspects of database quality. The problems associated with database comparisons and database mergers, such as those occurring in the process of database unification in the actual merger of two pharmaceutical companies, provide challenging tasks and opportunities for data science. Some of the tools for effective dimension reduction and dimension expansion are reviewed in the context of database quality control and conditions for database compatibility are presented. A common misconception affecting data sampling techniques for data quality evaluation is discussed and methods for circumventing the associated sampling errors are described.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call