Abstract
This work examines the differences between a human and a machine in object recognition tasks. The machine is useful as much as the output classification labels are correct and match the dataset-provided labels. However, very often a discrepancy occurs because the dataset label is different than the one expected by a human. To correct this, the concept of the target user population is introduced. The paper presents a complete methodology for either adapting the output of a pre-trained, state-of-the-art object classification algorithm to the target population or inferring a proper, user-friendly categorization from the target population. The process is called ‘user population re-targeting’. The methodology includes a set of specially designed population tests, which provide crucial data about the categorization that the target population prefers. The transformation between the dataset-bound categorization and the new, population-specific categorization is called the ‘Cognitive Relevance Transform’. The results of the experiments on the well-known datasets have shown that the target population preferred such a transformed categorization by a large margin, that the performance of human observers is probably better than previously thought, and that the outcome of re-targeting may be difficult to predict without actual tests on the target population.
Highlights
Humans have a different perception of categories than Convolutional Neural Networks (CNNs)
The null hypothesis was defined as Performance of random Cognitive Relevance Transform (CRT) transforms is equal to measured performance for humans and a machine
Examples of images where CNN clearly missed the category are shown in Figures 14 and 15
Summary
Humans have a different perception of categories than Convolutional Neural Networks (CNNs).For one, CNNs use exclusively visual features to perform the classification. Humans have a different perception of categories than Convolutional Neural Networks (CNNs). Humans think differently and explore different, high-level features on images to perform a classification task [2,3,4,5,6,7,8]. A reptile (Figure 1) fall into different categories, as humans take into the account the high-level concept of affordances—perceivable action possibilities (i.e., only actions that depend on users’ physical capabilities, their goals and past experiences) [9]. The algorithm, focuses only on visual information (color, shape, and texture) [10] Because both types of animals are similar in color and have a similar texture (see Figure 1), they fall into the same category despite being semantically different. From the human perception point of view, such a categorization is wrong and should be punished more severely (with greater penalty) in the training or evaluation process
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.