Abstract
An important problem in machine learning is that, when using more than two labels, it is very difficult to construct and optimize a group of learning functions that are still useful when the prior distribution of instances is changed. To resolve this problem, semantic information G theory, Logical Bayesian Inference (LBI), and a group of Channel Matching (CM) algorithms are combined to form a systematic solution. A semantic channel in G theory consists of a group of truth functions or membership functions. In comparison with the likelihood functions, Bayesian posteriors, and Logistic functions that are typically used in popular methods, membership functions are more convenient to use, providing learning functions that do not suffer the above problem. In Logical Bayesian Inference (LBI), every label is independently learned. For multilabel learning, we can directly obtain a group of optimized membership functions from a large enough sample with labels, without preparing different samples for different labels. Furthermore, a group of Channel Matching (CM) algorithms are developed for machine learning. For the Maximum Mutual Information (MMI) classification of three classes with Gaussian distributions in a two-dimensional feature space,only 2–3 iterations are required for the mutual information between three classes and three labels to surpass 99% of the MMI for most initial partitions For mixture models, the Expectation-Maximization (EM) algorithm is improved to form the CM-EM algorithm, which can outperform the EM algorithm when the mixture ratios are imbalanced, or when local convergence exists. The CM iteration algorithm needs to combine with neural networks for MMI classification in high-dimensional feature spaces. LBI needs further investigation for the unification of statistics and logic.
Highlights
Machine learning is based on learning functions and classifiers
Only 2–3 iterations were required for the mutual information to surpass 99% of the Mutual Information (MMI)
The following three examples show that the Channel Matching (CM)-EM algorithm can outperform both the EM
Summary
Machine learning is based on learning functions and classifiers. In 1922, Fisher [1] proposed the Likelihood Inference (LI), which uses likelihood functions as learning functions and it uses the MaximumLikelihood (ML) criterion to optimize the learning functions and classifiers (see abbreviations in this paper). Machine learning is based on learning functions and classifiers. In 1922, Fisher [1] proposed the Likelihood Inference (LI), which uses likelihood functions as learning functions and it uses the Maximum. Likelihood (ML) criterion to optimize the learning functions and classifiers (see abbreviations in this paper). When the prior distribution, P(x) (where x is an instance), is changed, the optimized likelihood function will be invalid. As LI cannot make use of prior knowledge, Bayesians proposed. Bayesian Inference (BI) during the 1950s [2,3], which uses Bayesian posteriors as learning functions. In many cases, we only have prior knowledge of instances, instead of labels or model parameters and, BI is still not good in such cases.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.