Abstract

In this paper, we propose a self-adaptive deep neural network architecture suitable for object tracking and labelling. In particular, an adaptation mechanism is introduced that automatically evaluates the performance of the network and then updates its weights to fit the current environmental conditions. The retraining algorithm trusts as much as possible the current conditions (discriminative constraints), while simultaneously providing a minimal degradation of the already obtained knowledge of the network (generative constraints). The underline assumption is that a small weight perturbation is adequate to modify the performance of the network to the new situations, without this constraint implying a small modification of the network output due to the highly non-linear surface that the network models. Under this assumption, we propose a computationally efficient re-training algorithm to tackle the variations of the visual environment, requiring a small number of labelled samples for the adaptation. Weight updating is combined with an unsupervised learning paradigm, implemented through stacked autoencoders, in order to improve convergence, stability and performance of the object tracking and labeling process by propagating the sensory inputs into deep level of hierarchies and therefore structuring the inputs from low representations to more abstract forms. Approximates of current visual environment are provided through a dynamic tracker that combines motion and learning features to automatically create few confident labeled data. The proposed retraining scheme is computationally efficient and able to model non-stationary environments, like the ones appeared in real-life computer vision application scenarios. Experimental results and comparisons are provided on video datasets of very complicated visual content, monitoring industrial workflows of car assembly within a manufactory. The results indicates that our self-adaptive deep neural network architecture is able to correctly label and separate foreground objects from background even under severe visual changes, such as occlusion, illumination variations and change of camera views, and within real time computational constraints.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.