Multimodal Unsupervised Image-to-Image Translation

Research output: Contribution to journal › Conference article › Research › peer-review

Standard

Multimodal Unsupervised Image-to-Image Translation. / Huang, Xun; Liu, Ming Yu; Belongie, Serge; Kautz, Jan.

In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, p. 179-196.

Research output: Contribution to journal › Conference article › Research › peer-review

Harvard

Huang, X, Liu, MY, Belongie, S & Kautz, J 2018, 'Multimodal Unsupervised Image-to-Image Translation', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 179-196. https://doi.org/10.1007/978-3-030-01219-9_11

APA

Huang, X., Liu, M. Y., Belongie, S., & Kautz, J. (2018). Multimodal Unsupervised Image-to-Image Translation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 179-196. https://doi.org/10.1007/978-3-030-01219-9_11

Vancouver

Huang X, Liu MY, Belongie S, Kautz J. Multimodal Unsupervised Image-to-Image Translation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2018;179-196. https://doi.org/10.1007/978-3-030-01219-9_11

Author

Huang, Xun ; Liu, Ming Yu ; Belongie, Serge ; Kautz, Jan. / Multimodal Unsupervised Image-to-Image Translation. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2018 ; pp. 179-196.

Bibtex

@inproceedings{4324b3bf10cd4445a51e5a8243f6b867,

title = "Multimodal Unsupervised Image-to-Image Translation",

abstract = "Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any examples of corresponding image pairs. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain image. To address this limitation, we propose a Multimodal Unsupervised Image-to-image (MUNIT) framework. We assume that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to state-of-the-art approaches further demonstrate the advantage of the proposed framework. Moreover, our framework allows users to control the style of translation outputs by providing an example style image. Code and pretrained models are available at https://github.com/nvlabs/MUNIT.",

keywords = "GANs, Image-to-image translation, Style transfer",

author = "Xun Huang and Liu, {Ming Yu} and Serge Belongie and Jan Kautz",

note = "Publisher Copyright: {\textcopyright} 2018, Springer Nature Switzerland AG.; 15th European Conference on Computer Vision, ECCV 2018 ; Conference date: 08-09-2018 Through 14-09-2018",

year = "2018",

doi = "10.1007/978-3-030-01219-9_11",

language = "English",

pages = "179--196",

journal = "Lecture Notes in Computer Science",

issn = "0302-9743",

publisher = "Springer Verlag",

}

RIS

TY - GEN

T1 - Multimodal Unsupervised Image-to-Image Translation

AU - Huang, Xun

AU - Liu, Ming Yu

AU - Belongie, Serge

AU - Kautz, Jan

PY - 2018

Y1 - 2018

N2 - Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any examples of corresponding image pairs. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain image. To address this limitation, we propose a Multimodal Unsupervised Image-to-image (MUNIT) framework. We assume that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to state-of-the-art approaches further demonstrate the advantage of the proposed framework. Moreover, our framework allows users to control the style of translation outputs by providing an example style image. Code and pretrained models are available at https://github.com/nvlabs/MUNIT.

AB - Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any examples of corresponding image pairs. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain image. To address this limitation, we propose a Multimodal Unsupervised Image-to-image (MUNIT) framework. We assume that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to state-of-the-art approaches further demonstrate the advantage of the proposed framework. Moreover, our framework allows users to control the style of translation outputs by providing an example style image. Code and pretrained models are available at https://github.com/nvlabs/MUNIT.

KW - GANs

KW - Image-to-image translation

KW - Style transfer

UR - http://www.scopus.com/inward/record.url?scp=85055125404&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-01219-9_11

DO - 10.1007/978-3-030-01219-9_11

M3 - Conference article

AN - SCOPUS:85055125404

SP - 179

EP - 196

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

T2 - 15th European Conference on Computer Vision, ECCV 2018

Y2 - 8 September 2018 through 14 September 2018

ER -

ID: 301825957

Datalogisk Institut