Multimodal Unsupervised Image-to-Image Translation
Research output: Contribution to journal › Conference article › Research › peer-review
Standard
Multimodal Unsupervised Image-to-Image Translation. / Huang, Xun; Liu, Ming Yu; Belongie, Serge; Kautz, Jan.
In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, p. 179-196.Research output: Contribution to journal › Conference article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Multimodal Unsupervised Image-to-Image Translation
AU - Huang, Xun
AU - Liu, Ming Yu
AU - Belongie, Serge
AU - Kautz, Jan
N1 - Publisher Copyright: © 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any examples of corresponding image pairs. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain image. To address this limitation, we propose a Multimodal Unsupervised Image-to-image (MUNIT) framework. We assume that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to state-of-the-art approaches further demonstrate the advantage of the proposed framework. Moreover, our framework allows users to control the style of translation outputs by providing an example style image. Code and pretrained models are available at https://github.com/nvlabs/MUNIT.
AB - Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any examples of corresponding image pairs. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain image. To address this limitation, we propose a Multimodal Unsupervised Image-to-image (MUNIT) framework. We assume that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to state-of-the-art approaches further demonstrate the advantage of the proposed framework. Moreover, our framework allows users to control the style of translation outputs by providing an example style image. Code and pretrained models are available at https://github.com/nvlabs/MUNIT.
KW - GANs
KW - Image-to-image translation
KW - Style transfer
UR - http://www.scopus.com/inward/record.url?scp=85055125404&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-01219-9_11
DO - 10.1007/978-3-030-01219-9_11
M3 - Conference article
AN - SCOPUS:85055125404
SP - 179
EP - 196
JO - Lecture Notes in Computer Science
JF - Lecture Notes in Computer Science
SN - 0302-9743
T2 - 15th European Conference on Computer Vision, ECCV 2018
Y2 - 8 September 2018 through 14 September 2018
ER -
ID: 301825957