The present paper aims to propose an information-theoretic method for interpreting the inference mechanism of neural networks. The new method aims to interpret the inference mechanism minimally by disentangling complex information into simpler and easily interpretable information. This disentanglement of complex information can be realized by maximizing mutual information between input patterns and the corresponding neurons. However, because the use of mutual information has faced difficulty in computation, we use the well-known autoencoder to increase mutual information by re-interpreting the sparsity constraint, which is considered a device to increase mutual information. The computational procedures to increase mutual information are decomposed into the serial operation of equal use of neurons and specific responses to input patterns. The specific responses are realized by enhancing the results by the equal use of neurons. The method was applied to three data sets: the glass, office equipment, and pulsar data sets. With all three data sets, we could observe that, when the number of neurons was forced to increase, mutual information could be increased. Then, collective weights, or average collectively treated weights, showed that the method could extract the simple and linear relations between inputs and targets, making it possible to interpret the inference mechanism minimally.
Read full abstract