Abstract

With the proliferation of deep artificial neural networks, techniques allowing end users to understand why a network came to a certain conclusion are becoming increasingly important. The lack of such understanding is becoming a limiting factor in applying deep neural networks in critical tasks, where the price of error is high. Recently it has been shown that internal representations built by a deep neural network can sometimes be aligned with concepts of a domain ontology, related to the network target. This opens an opportunity of explaining the results of a deep neural network in human terms (defined in the ontology). The paper presents the results of several experiments aimed at understanding what layers of a neural network are most perspective for the alignment with given ontology concept (characterized by its relations with the network target). The experiments were performed with several datasets (XTRAINS, SCDB) and several network architectures (including custom convolutional neural network architecture, ResNet, MobileNetV2). For these dataset-neural architecture pairs we built "concept localization maps" showing how informative is the output of each layer for predicting that given sample corresponds to a certain concept. The results of the experiments show that the concepts that are "closer" to the target concept (definition-wise) are typically better expressed (or, localized) in the last layers. Besides, the concept expression typically follows a roughly unimodal shape. We believe that these results can be used for building effective algorithms for concept extraction and improve the ontology-based explanation techniques for deep neural networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call