Reach out with magic ideas!

[email protected]


Virtual Try-On Using Diffusion Models

Virtual Try-On Using Diffusion Models

A new paper from the Alibaba research teams presents a new type of video virtual try on technique. ViViD (Video Virtual Try-on using Diffusion Models) uses artificial intelligence to create realistic video simulations of clothing on a person but goes beyond static images typically seen in virtual try-on experiences or 3D and AR solutions. Instead it analyzes both garment details and the user’s movements to generate lifelike videos.

At its core, ViViD uses diffusion models. These models are trained on vast amounts of data to excel at generating realistic and detailed video content. Additionally, ViViD employs specialized encoders that capture the details of both clothing and human body poses. This combination allows ViViD to create video footage with the person in the original footage, but in a new outfit.

In short ViViD differs from other technical approaches through several key advancements:


Advanced Diffusion Models: Unlike conventional methods, ViViD uses AI models specifically designed for generating high-quality video content. This allows for the creation of realistic videos where the virtual garments adapt to the wearer’s movements.

Dual Encoding: ViViD utilizes two types of encoders: garment encoders that capture details of the clothing, and pose encoders that analyze the user’s body position. This approach ensures that the virtual try-on not only looks accurate but also feels natural as the user moves, capturing nuances like fabric texture and movement.

Spatial-Temporal Consistency: Maintaining consistency in both space (how the garment sits on the body) and time (how it moves throughout the video) is a major challenge in virtual try-on technology. ViViD’s algorithms address this by keeping the virtual garment in sync with the user’s movements.


Great potential for ecommerce

We’ve earlier covered AI powered virtual try on solutions on the site, looking into how products can better be visualized for online customers both on product page models but also in the long run – themselves. If you look into how big online retailers like Asos today use video content as a part of how they showcase garments and shoe, it’s easy to imagine how their production flow could be entirely automized using just a few basic videos with human models. 

This would also make it a lot more easy for retailers to provide bigger diversity in body type and skin tones online – something a lot of shoppers of today ask for. And in the end, just uploading a video of yourself and on each product get an instant visualization of how you would like wearing  different items.

Read the paper here.