Induction in Machine Learning

Daniel Harris,Darin Dunham

doi:10.1109/aero50100.2021.9438325

Abstract

Given today's context of data saturation, whereby more data exist than anyone knows what to do with, too many who employ machine learning techniques treat machine learning algorithms as oracles to truths unreachable by the human mind. The decision boundaries these algorithms produce can achieve wonderful performance during testing, but are all too often accepted with an inadequate causal justification assuming a causal explanation is attempted at all. This paper challenges the non-causal approach to machine learning for good reason. Abandoning causal justifications and relying on useful mathematical models to make predictions has been attempted in astronomy over two-thousand years ago, and those models have led to fourteen centuries of stagnation in that domain of science and inexplicable prediction failures. This approach has also been attempted in other domains of science over the last two-thousand years and the result is always the same-stagnation and inexplicable prediction failures. This paper begins its challenge against the non-causal approach to machine learning at the root by analyzing the machine learning process to reveal an error in its inductive method. It is shown that machine learning algorithms that employ this inductive method cannot be relied upon to learn valid generalizations. In order to avoid this invalid inductive method, a necessary modification to the machine learning process is presented, which assumes a valid inductive method is available to form valid generalizations before machine learning algorithms are employed. A valid method of induction is then presented, which is shown to be governed by the law of causality. This paper then turns to defining which features are causally valid to use for classification and a proper approach to searching for these features leveraging machine learning algorithms. Finally, an example comparing and contrasting the traditional versus the modified machine learning process is presented to illustrate the power of a causal approach. As a result, practitioners of this modified machine learning process have conscious control over their own success in domains that have reached valid generalizations, and conscious control over their own progress in domains that have yet to reach valid generalizations.

Full Text