Clothing image recognition has recently received considerable attention from many communities, such as multimedia information processing and computer vision, due to its commercial and social applications. However, the large variations in clothing images’ appearances and styles and their complicated formation conditions make the problem challenging. In addition, a generic treatment with convolutional neural networks (CNNs) cannot provide a satisfactory solution considering the training time and recognition performance. Therefore, how to balance those two factors for clothing image recognition is an interesting problem. Motivated by the fast training and straightforward solutions exhibited by extreme learning machines (ELMs), in this paper, we propose a recognition framework that is based on multiple sources of features and ELM neural networks. In this framework, three types of features are first extracted, including CNN features with pre-trained networks, histograms of oriented gradients and color histograms. Second, those low-level features are concatenated and taken as the inputs to an autoencoder version of the ELM for deep feature-level fusion. Third, we propose an ensemble of adaptive ELMs for decision-level fusion using the previously obtained feature-level fusion representations. Extensive experiments are conducted on an up-to-date large-scale clothing image data set. Those experimental results show that the proposed framework is competitive and efficient.
Read full abstract