Abstract

This work examines the differences between a human and a machine in object recognition tasks. The machine is useful as much as the output classification labels are correct and match the dataset-provided labels. However, very often a discrepancy occurs because the dataset label is different than the one expected by a human. To correct this, the concept of the target user population is introduced. The paper presents a complete methodology for either adapting the output of a pre-trained, state-of-the-art object classification algorithm to the target population or inferring a proper, user-friendly categorization from the target population. The process is called ‘user population re-targeting’. The methodology includes a set of specially designed population tests, which provide crucial data about the categorization that the target population prefers. The transformation between the dataset-bound categorization and the new, population-specific categorization is called the ‘Cognitive Relevance Transform’. The results of the experiments on the well-known datasets have shown that the target population preferred such a transformed categorization by a large margin, that the performance of human observers is probably better than previously thought, and that the outcome of re-targeting may be difficult to predict without actual tests on the target population.

Highlights

  • Humans have a different perception of categories than Convolutional Neural Networks (CNNs)

  • The null hypothesis was defined as Performance of random Cognitive Relevance Transform (CRT) transforms is equal to measured performance for humans and a machine

  • Examples of images where CNN clearly missed the category are shown in Figures 14 and 15

Read more

Summary

Introduction

Humans have a different perception of categories than Convolutional Neural Networks (CNNs).For one, CNNs use exclusively visual features to perform the classification. Humans have a different perception of categories than Convolutional Neural Networks (CNNs). Humans think differently and explore different, high-level features on images to perform a classification task [2,3,4,5,6,7,8]. A reptile (Figure 1) fall into different categories, as humans take into the account the high-level concept of affordances—perceivable action possibilities (i.e., only actions that depend on users’ physical capabilities, their goals and past experiences) [9]. The algorithm, focuses only on visual information (color, shape, and texture) [10] Because both types of animals are similar in color and have a similar texture (see Figure 1), they fall into the same category despite being semantically different. From the human perception point of view, such a categorization is wrong and should be punished more severely (with greater penalty) in the training or evaluation process

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.