A methodology, derived by analogy to Shannon’s information-theoretic theory of communication and utilizing the concept of mutual information, has been developed to characterize partitioned property spaces. A family of non-intersecting subsets that cover the “universe” of objects represents a partitioned property space. Each subset is thus an equivalence class. A partition and it’s associated equivalence classes can be generated using any one of a number of procedures including hierarchical and non-hierarchical clustering, direct approaches using rough set methods, and cell-based partitioning, to name a few. Thus, partitioned property spaces arise in many instances and represent a very large class of problems. The approach is based on set-valued mappings from equivalence classes in one partition to those in another and provides a coarse-grained means for comparing property spaces. From these mappings it is possible to compute a number of Shannon entropies that afford calculation of mutual information, which represents that amount of information shared by two partitions of a set of objects. Taking the ratio of the mutual information with the maximum possible mutual information yields a quantity that measures the similarity of the two partitions. While the focus in this work is directed towards small sets of objects the approach can be extended to many more classes of problems that can be put into a similar form, which includes many types of cheminformatic and biological problems. A number of scenarios are presented that illustrate the concept and indicate the broader class of problems that can be handled by this method.
Read full abstract