Abstract

A novel technology named fashion intelligence system, which quantifies ambiguous expressions unique to fashion, such as “casual,” “adult-casual,” and “office-casual,” was previously proposed to support users in their understanding of fashion. However, the existing visual-semantic embedding (VSE) model, which forms the basis of the system, does not support images that are composed of multiple parts, such as those containing hair, tops, trousers, skirts, and shoes. Therefore, we propose a partial VSE (PVSE) model, which enables fine-grained learning of each part of the fashion outfit. The proposed model learns embedded representations via angular-based contrastive learning. This helps in retaining three existing practical functionalities and further enables image-retrieval tasks where changes are only made to specified parts and image-reordering tasks focusing on the specified parts. In other words, the proposed model enables five types of practical functionalities, even with a simple structure. Through qualitative and quantitative experiments, we demonstrate that the proposed model is superior to conventional models, without increasing computational complexity.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.