Abstract
Multimodal learning research has advanced quickly over the past ten years in a variety of fields, especially computer vision. Due to the growing opportunities of multimodal streaming data and deep learning algorithms, deep multimodal learning is increasingly common. This calls for the development of models that can reliably handle and interpret the multimodal data. Unstructured real-world data, often known as modalities, can naturally assume many different shapes, including both text and images often. Deep learning researchers are still driven by the need to extract useful patterns from this type of data. It is crucial to have well-organized product catalogues to enhance customers' experiences as they explore the plethora of possibilities provided by online marketplaces. The availability of product characteristics like colour or material is a crucial component of it. However, attribute data is frequently erroneous or absent on several of the markets we focus on. Utilizing deep models that have been trained on huge corpora to predict features from unstructured data, such as product descriptions and photographs (referred to as modalities in this study), is one potential approach to solving this issue. To receive a comprehensive rundown of the various multi-modal colour extraction techniques and their advantages, drawbacks and open challenges.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have