Transform, Warp, and Dress: A New Transformation-guided Model for Virtual Try-on
Virtual try-on has recently emerged in computer vision and multimedia communities with the development of architectures that can generate realistic images of a target person wearing a custom garment. This research interest is motivated by the large role played by e-commerce and online shopping in our society. Indeed, the virtual try-on task can offer many opportunities to improve the efficiency of preparing fashion catalogs and to enhance the online user experience. The problem is far to be solved: current architectures do not reach sufficient accuracy with respect to manually generated images and can only be trained on image pairs with a limited variety. Existing virtual try-on datasets have two main limits: they contain only female models, and all the images are available only in low resolution. This not only affects the generalization capabilities of the trained architectures but makes the deployment to real applications impractical. To overcome these issues, we present Dress Code , a new dataset for virtual try-on that contains high-resolution images of a large variety of upper-body clothes and both male and female models. Leveraging this enriched dataset, we propose a new model for virtual try-on capable of generating high-quality and photo-realistic images using a three-stage pipeline. The first two stages perform two different geometric transformations to warp the desired garment and make it fit into the target person’s body pose and shape. Then, we generate the new image of that same person wearing the try-on garment using a generative network. We test the proposed solution on the most widely used dataset for this task as well as on our newly collected dataset and demonstrate its effectiveness when compared to current state-of-the-art methods. Through extensive analyses on our Dress Code dataset, we show the adaptability of our model, which can generate try-on images even with a higher resolution.
- Research Article
4
- 10.5934/kjhe.2009.18.3.719
- Jun 30, 2009
- Korean Journal of Human Ecology
The ultimate success of commercial applications of body scan data in the apparel industry will be consumers' substantial applications such as automated custom fit, size prediction, virtual try-on, personal shopper services (Loker, S. et al., 2004). In this study, we surveyed fifty consumers and forty-seven apparel industry workers about their recognition and interest in 3D body scanning and virtual try-on. The results are as follows: 55% of the apparel industry workers has recognized 3D body scanning as a convenient technology, but do not know how to use it. To the questions regarding virtual try-on, 53% of the workers give positive answers. The consumers have a more positive view on virtual try-on than the workers do. The workers predict that the application of 3D body scan technology to the apparel industry could offer customers helpful information in their clothing selection by using virtual images of various size and style, and increase mass production of MTM(Made-To-Measure). The answers from the male consumers in their twenties indicate that virtual try-on is useful by 88% on offline shopping and by 100% on online shopping. 53% of the workers and 68% of the consumers gave answers that just by virtual try-on they could judge the quality of the apparel products and purchase them. Absolutely 3D virtual try-on is an effective tool for online shoppers. 85% of the workers anticipate applications of the 3D body scanning also in 'body measurement', 'custom pattern development' as well as 'virtual try-on' in the near future. With the positive reactions and the stimulating interests in virtual try-on, the conditions of contemporary world encourage more active researches and wide usages of the technology in apparel industry.
- Research Article
10
- 10.1007/s00521-024-10843-6
- Jan 9, 2025
- Neural Computing and Applications
In today’s digital age, consumers increasingly rely on online shopping for convenience and accessibility. However, a significant drawback of online shopping is the inability to physically try on clothing before purchasing. This limitation often leads to uncertainty regarding fit and style, resulting in customer post-purchase dissatisfaction and higher return rates. Research indicates that online items are three times more likely to be returned than in-store ones, especially during the pandemic. To address this challenge, we propose a virtual try-on method called FITMI, an enhanced Latent Diffusion Textual Inversion model for virtual try-on purposes. The proposed architecture aims to bridge the gap between traditional in-store try-ons and online shopping by offering users a realistic and interactive virtual try-on experience. Although virtual try-on solutions already exist, recent advancements in artificial intelligence have significantly enhanced their capabilities, enabling more sophisticated and realistic virtual try-on experiences than ever before. Building on these advancements, FITMI surpasses ordinary virtual try-ons relying on generative adversarial networks, often producing unrealistic outputs. Instead, FITMI utilizes latent diffusion models to generate high-quality images with detailed textures. As a web application, FITMI facilitates virtual try-ons by seamlessly integrating images of users with garments from catalogs, providing a true-to-life representation of how the items would look. This approach differentiates us from competitors. FITMI is validated using two widely recognized benchmarks: the Dress-Code and Viton-HD datasets. Additionally, FITMI acts as a trusted style advisor, enhancing the shopping experience by recommending complementary items to elevate the chosen garment and suggesting similar options based on user preferences.
- Research Article
6
- 10.1016/j.imavis.2022.104568
- Nov 1, 2022
- Image and Vision Computing
ST-VTON: Self-supervised vision transformer for image-based virtual try-on
- Research Article
11
- 10.3390/s20195647
- Oct 2, 2020
- Sensors
Virtual Try-on is the ability to realistically superimpose clothing onto a target person. Due to its importance to the multi-billion dollar e-commerce industry, the problem has received significant attention in recent years. To date, most virtual try-on methods have been supervised approaches, namely using annotated data, such as clothes parsing semantic segmentation masks and paired images. These approaches incur a very high cost in annotation. Even existing weakly-supervised virtual try-on methods still use annotated data or pre-trained networks as auxiliary information and the costs of the annotation are still significantly high. Plus, the strategy using pre-trained networks is not appropriate in the practical scenarios due to latency. In this paper we propose Unsupervised VIRtual Try-on using disentangled representation (UVIRT). After UVIRT extracts a clothes and a person feature from a person image and a clothes image respectively, it exchanges a clothes and a person feature. Finally, UVIRT achieve virtual try-on. This is all achieved in an unsupervised manner so UVIRT has the advantage that it does not require any annotated data, pre-trained networks nor even category labels. In the experiments, we qualitatively and quantitatively compare between supervised methods and our UVIRT method on the MPV dataset (which has paired images) and on a Consumer-to-Consumer (C2C) marketplace dataset (which has unpaired images). As a result, UVIRT outperform the supervised method on the C2C marketplace dataset, and achieve comparable results on the MPV dataset, which has paired images in comparison with the conventional supervised method.
- Conference Article
11
- 10.1109/iccvw.2019.00161
- Oct 1, 2019
With the rapid growth of online commerce, image-based virtual try-on systems for fitting new in-shop garments onto a person image presents an exciting opportunity to deliver interactive customer experience. Current state-of-the-art methods achieve this in a two-stage pipeline, where the first stage transforms the in-shop cloth into fitting the body shape of the target person and the second stage employs an image composition module to seamlessly integrate the transformed in-shop cloth onto the target person image. In the present work, we introduce a multi-scale patch adversarial loss for training the warping module of a state-of-the-art virtual try-on network. We show that the proposed loss produces robust transformation of clothes to fit the body shape while preserving texture details, which in turn improves image composition in the second stage. We perform extensive evaluations of the proposed loss on the try-on performance and show significant performance improvement over the existing state-of-the-art method.
- Conference Article
61
- 10.1109/iccv48922.2021.01299
- Oct 1, 2021
Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value. However, existing 3D virtual try-on methods mainly rely on annotated 3D human shapes and garment templates, which hinders their applications in practical scenarios. 2D virtual try-on approaches provide a faster alternative to manipulate clothed humans, but lack the rich and realistic 3D representation. In this paper, we propose a novel Monocular-to-3D Virtual Try-On Network (M3D-VTON) that builds on the merits of both 2D and 3D approaches. By integrating 2D information efficiently and learning a mapping that lifts the 2D representation to 3D, we make the first attempt to reconstruct a 3D try-on mesh only taking the target clothing and a person image as inputs. The proposed M3D-VTON includes three modules: 1) The Monocular Prediction Module (MPM) that estimates an initial full-body depth map and accomplishes 2D clothes-person alignment through a novel two-stage warping procedure; 2) The Depth Refinement Module (DRM) that refines the initial body depth to produce more detailed pleat and face characteristics; 3) The Texture Fusion Module (TFM) that fuses the warped clothing with the non-target body part to refine the results. We also construct a high-quality synthesized Monocular-to-3D virtual try-on dataset, in which each person image is associated with a front and a back depth map. Extensive experiments demonstrate that the proposed M3D-VTON can manipulate and reconstruct the 3D human body wearing the given clothing with compelling details and is more efficient than other 3D approaches.
- Conference Article
22
- 10.1109/icpr48806.2021.9412052
- Jan 10, 2021
The large spread of online shopping has led computer vision researchers to develop different solutions for the fashion domain to potentially increase the online user experience and improve the efficiency of preparing fashion catalogs. Among them, image-based virtual try-on has recently attracted a lot of attention resulting in several architectures that can generate a new image of a person wearing an input try-on garment in a plausible and realistic way. In this paper, we present VITON-G T, a new model for virtual try-on that generates high-quality and photo-realistic images thanks to multiple geometric transformations. In particular, our model is composed of a two-stage geometric transformation module that performs two different projections on the input garment, and a transformation-guided try-on module that synthesizes the new image. We experimentally validate the proposed solution on the most common dataset for this task, containing mainly t-shirts, and we demonstrate its effectiveness compared to different baselines and previous methods. Additionally, we assess the generalization capabilities of our model on a new set of fashion items composed of upper-body clothes from different categories. To the best of our knowledge, we are the first to test virtual try-on architectures in this challenging experimental setting.
- Video Transcripts
- 10.48448/g7fw-3613
- Dec 29, 2020
- Underline Science Inc.
The large spread of online shopping has led computer vision researchers to develop different solutions for the fashion domain to potentially increase the online user experience and improve the efficiency of preparing fashion catalogs. Among them, image-based virtual try-on has recently attracted a lot of attention resulting in several architectures that can generate a new image of a person wearing an input try-on garment in a plausible and realistic way. In this paper, we present VITON-GT, a new model for virtual try-on that generates high-quality and photo-realistic images thanks to multiple geometric transformations. In particular, our model is composed of a two-stage geometric transformation module that performs two different projections on the input garment, and a transformation-guided try-on module that synthesizes the new image. We experimentally validate the proposed solution on the most common dataset for this task, containing mainly t-shirts, and we demonstrate its effectiveness compared to different baselines and previous methods. Additionally, we assess the generalization capabilities of our model on a new set of fashion items composed of upper-body clothes from different categories. To the best of our knowledge, we are the first to test virtual try-on architectures in this challenging experimental setting.
- Conference Article
115
- 10.1109/iccv.2019.00125
- Oct 1, 2019
Beyond current image-based virtual try-on systems that have attracted increasing attention, we move a step forward to developing a video virtual try-on system that precisely transfers clothes onto the person and generates visually realistic videos conditioned on arbitrary poses. Besides the challenges in image-based virtual try-on (e.g., clothes fidelity, image synthesis), video virtual try-on further requires spatiotemporal consistency. Directly adopting existing image-based approaches often fails to generate coherent video with natural and realistic textures. In this work, we propose Flow-navigated Warping Generative Adversarial Network (FW-GAN), a novel framework that learns to synthesize the video of virtual try-on based on a person image, the desired clothes image, and a series of target poses. FW-GAN aims to synthesize the coherent and natural video while manipulating the pose and clothes. It consists of: (i) a flow-guided fusion module that warps the past frames to assist synthesis, which is also adopted in the discriminator to help enhance the coherence and quality of the synthesized video; (ii) a warping net that is designed to warp clothes image for the refinement of clothes textures; (iii) a parsing constraint loss that alleviates the problem caused by the misalignment of segmentation maps from images with different poses and various clothes. Experiments on our newly collected dataset show that FW-GAN can synthesize high-quality video of virtual try-on and significantly outperforms other methods both qualitatively and quantitatively.
- Research Article
4
- 10.1016/j.neunet.2024.106353
- May 1, 2024
- Neural Networks
A novel garment transfer method supervised by distilled knowledge of virtual try-on model
- Research Article
33
- 10.1002/cb.2158
- Mar 20, 2023
- Journal of Consumer Behaviour
The study aims to empirically examine the consumers' online apparel purchasing behavior using the constructs from the technology acceptance model (UTAUT). The complex interrelationships between perceived usefulness, perceived risk, perceived enjoyment, and virtual try‐on (VTO) technology were explored using a moderated moderated‐mediation model. Most importantly, this research focuses on how VTO, one of the frequently used disruptive technologies, influences consumer behavior. Using a structured survey instrument, the data was collected from 288 millennial respondents and has been analyzed using Hayes's PROCESS macros. The results reveal that attitude towards VTO mediated the relationship between perceived usefulness and behavioral intention of customers to engage in online shopping. Perceived risk (first moderator) negatively moderated the relationship between perceived usefulness and attitude towards VTO, and perceived enjoyment (second moderator) has positively moderated the relationship between perceived usefulness and perceived risk and behavioral intention mediated through attitude towards VTO. The theoretical and practical implications were also discussed.
- Research Article
1
- 10.1108/ijrdm-01-2025-0060
- Nov 10, 2025
- International Journal of Retail & Distribution Management
Purpose Virtual try-on (VTO) technology offers consumers a shopping experience comparable to direct product examination by providing detailed product information and enhancing enjoyment during online shopping. Against this backdrop, this study extends the technology acceptance model (TAM) to investigate the factors influencing consumers’ attitudes toward VTO technology and their willingness to purchase online. Design/methodology/approach Data were collected through an online survey of 228 Italian respondents. The proposed research model was tested through an exploratory factor analysis (EFA) and a confirmatory factor analysis (CFA), followed by a structural equation model (SEM) with an ordered Probit approach. Findings The results highlight the significant impact of perceived enjoyment (PE), innovativeness and perceived environmental benefits (PEBs) on consumer attitudes toward VTO and their online purchase intentions. This study underscores the role of VTO in enhancing online shopping experiences, leveraging both utilitarian and hedonic values, ultimately encouraging technology adoption. Research limitations/implications The findings offer valuable insights for retailers seeking to encourage online shopping using VTO technology, fostering PE, stimulating interest in innovative solutions and promoting responsible consumption choices. Originality/value This research extends the TAM by integrating external variables – innovativeness and PEBs – into the analysis, while accounting for the mediating role of PE on consumer behaviour.
- Research Article
30
- 10.1109/tmm.2022.3143712
- Jan 1, 2022
- IEEE Transactions on Multimedia
Image-based virtual try-on is challenging in fitting a target in-shop clothes onto a reference person under diverse human poses. Previous works focus on preserving clothing details ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g.,</i> texture, logos, patterns) when transferring desired clothes onto a target person under a fixed pose. However, the performances of existing methods significantly dropped when extending existing methods to multi-pose virtual try-on. In this paper, we propose an end-to-end Semantic Prediction Guidance multi-pose Virtual Try-On Network (SPG-VTON), which can fit the desired clothing into a reference person under arbitrary poses. Specifically, SPG-VTON is composed of three sub-modules. First, a Semantic Prediction Module (SPM) generates the desired semantic map. The predicted semantic map provides more abundant guidance to locate the desired clothing region and produce a coarse try-on image. Second, a Clothes Warping Module (CWM) warps in-shop clothes to the desired shape according to the predicted semantic map and the desired pose. Specifically, we introduce a conductible cycle consistency loss to alleviate the misalignment in the clothing warping process. Third, a Try-on Synthesis Module (TSM) combines the coarse result and the warped clothes to generate the final virtual try-on image, preserving details of the desired clothes and under the desired pose. In addition, we introduce a face identity loss to refine the facial appearance and maintain the identity of the final virtual try-on result at the same time. We evaluate the proposed method on the most massive multi-pose dataset (MPV) and the DeepFashion dataset. The qualitative and quantitative experiments show that SPG-VTON is superior to the state-of-the-art methods and is robust to data noise, including background and accessory changes, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> , hats and handbags, showing good scalability to the real-world scenario.
- Conference Article
241
- 10.1109/iccv.2019.00912
- Oct 1, 2019
Virtual try-on systems under arbitrary human poses have significant application potential, yet also raise extensive challenges, such as self-occlusions, heavy misalignment among different poses, and complex clothes textures. Existing virtual try-on methods can only transfer clothes given a fixed human pose, and still show unsatisfactory performances, often failing to preserve person identity or texture details, and with limited pose diversity. This paper makes the first attempt towards a multi-pose guided virtual try-on system, which enables clothes to transfer onto a person with diverse poses. Given an input person image, a desired clothes image, and a desired pose, the proposed Multi-pose Guided Virtual Try-On Network (MG-VTON) generates a new person image after fitting the desired clothes into the person and manipulating the pose. MG-VTON is constructed with three stages: 1) a conditional human parsing network is proposed that matches both the desired pose and the desired clothes shape; 2) a deep Warping Generative Adversarial Network (Warp-GAN) that warps the desired clothes appearance into the synthesized human parsing map and alleviates the misalignment problem between the input human pose and the desired one; 3) a refinement render network recovers the texture details of clothes and removes artifacts, based on multi-pose composition masks. Extensive experiments on commonly-used datasets and our newly-collected largest virtual try-on benchmark demonstrate that our MG-VTON significantly outperforms all state-of-the-art methods both qualitatively and quantitatively, showing promising virtual try-on performances.
- Research Article
21
- 10.1108/gkmc-06-2022-0125
- Jan 19, 2023
- Global Knowledge, Memory and Communication
PurposeThe purpose of this study is to develop an empirical model by understanding the relative significance of interactive technological forces, such as chatbots, virtual try-on technology (VTO) and e-word-of-mouth (e-WOM), to improve interactive marketing experiences among consumers. This study also validates the moderating role of the perceived effectiveness of e-commerce institutional mechanism (PEEIM) as a moderator between attitude and continued intention.Design/methodology/approachData were collected through personal visits and an online survey. The link to the survey questionnaire was shared on different social media platforms and social networking sites. A total of 362 responses obtained in the online and offline modes were considered for this study.Findingse-WOM emerged as the strongest predictor of attitude, followed by chatbots and VTO. The results of this study revealed that PEEIM did not moderate the relationship between attitude and continued intention.Originality/valueUsing the self-determination theory and behavioral reasoning theory as theoretical frameworks, this study is an initial endeavor in the online shopping context to empirically validate interactive forces like chatbots, VTO, e-WOM and PEEIM as moderators together to arrive at a holistic framework. These forces, in turn, act as significant contributors to online shopping satisfaction.