Direct Speech-to-Image Translation

Jiguo Li,Xinfeng Zhang,Li Zhang,Siwei Ma,Yue Wang,Wen Gao,Chuanmin Jia,Jizheng Xu

doi:10.1109/jstsp.2020.2987417

Abstract

Direct speech-to-image translation without text is an interesting and useful topic due to the potential applications in human-computer interaction, art creation, computer-aided design. etc. Not to mention that many languages have no writing form. However, as far as we know, it has not been well-studied how to translate the speech signals into images directly and how well they can be translated. In this paper, we attempt to translate the speech signals into the image signals without the transcription stage. Specifically, a speech encoder is designed to represent the input speech signals as an embedding feature, and it is trained with a pretrained image encoder using teacher-student learning to obtain better generalization ability on new classes. Subsequently, a stacked generative adversarial network is used to synthesize high-quality images conditioned on the embedding feature. Experimental results on both synthesized and real data show that our proposed method is effective to translate the raw speech signals into images without the middle text representation. Ablation study gives more insights about our method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Direct Speech-to-Image Translation

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing

Lead the way for us

Journal: IEEE Journal of Selected Topics in Signal Processing	Publication Date: Mar 1, 2020
Citations: 75

Similar Papers

Speech Encryption with Fractional Watermark
Yan Sun ... Cun Zhu
Computers, Materials & Continua | VOL. 73
Yan Sun, et. al.Yan Sun ... Cun Zhu
01 Jan 2021
Computers, Materials & Continua | VOL. 73

TripCEAiR: A multi-loss minimization approach for surface EMG based airwriting recognition
Ayush Tripathi ... Lalan Kumar
Biomedical Signal Processing and Control | VOL. 85
Ayush Tripathi, et. al.Ayush Tripathi ... Lalan Kumar
10 May 2023
Biomedical Signal Processing and Control | VOL. 85

Eye Blink Detection for Smart Glasses
Hoang Le ... Thanh Dang
-
Hoang Le, et. al.Hoang Le ... Thanh Dang
01 Dec 2013
01 Dec 2013

Multi-modal affect detection for learning applications
Yash Gogia ... Shreyash Mohatta
-
Yash Gogia, et. al.Yash Gogia ... Shreyash Mohatta
01 Nov 2016
01 Nov 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Direct Speech-to-Image Translation

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing