Text-Driven Stylization of Video Objects

Datalogisk Institut

Text-Driven Stylization of Video Objects

Publikation: Konferencebidrag › Paper › Forskning

Standard

Text-Driven Stylization of Video Objects. / Loeschcke, Sebastian ; Belongie, Serge; Benaim, Sagie.

2022. Paper præsenteret ved CVEU @ ECCV 2022 , Tel Aviv, Israel.

Publikation: Konferencebidrag › Paper › Forskning

Harvard

Loeschcke, S, Belongie, S & Benaim, S 2022, 'Text-Driven Stylization of Video Objects', Paper fremlagt ved CVEU @ ECCV 2022 , Tel Aviv, Israel, 24/03/2024.

APA

Loeschcke, S., Belongie, S., & Benaim, S. (2022). Text-Driven Stylization of Video Objects. Paper præsenteret ved CVEU @ ECCV 2022 , Tel Aviv, Israel.

Vancouver

Loeschcke S, Belongie S, Benaim S. Text-Driven Stylization of Video Objects. 2022. Paper præsenteret ved CVEU @ ECCV 2022 , Tel Aviv, Israel.

Author

Loeschcke, Sebastian ; Belongie, Serge ; Benaim, Sagie. / Text-Driven Stylization of Video Objects. Paper præsenteret ved CVEU @ ECCV 2022 , Tel Aviv, Israel.17 s.

Bibtex

@conference{581aa332f44d419aaaa9254840eed121,

title = "Text-Driven Stylization of Video Objects",

abstract = "We tackle the task of stylizing video objects in an intuitive and semantic manner following a user-specified text prompt. This is a challenging taskas the resulting video must satisfy multiple properties: (1) it has to be temporally consistent and avoid jittering or similar artifacts, (2) the resultingstylization must preserve both the global semantics of the object and its finegrained details, and (3) it must adhere to the user-specified text prompt. Tothis end, our method stylizes an object in a video according to two targettexts. The first target text prompt describes the global semantics and the second target text prompt describes the local semantics. To modify the style ofan object, we harness the representational power of CLIP to get a similarity score between (1) the local target text and a set of local stylized views,and (2) a global target text and a set of stylized global views. We use a pretrained atlas decomposition network to propagate the edits in a temporallyconsistent manner. We demonstrate that our method can generate consistent style changes over time for a variety of objects and videos, that adhere to the specification of the target texts. We also show how varying thespecificity of the target texts and augmenting the texts with a set of prefixes results in stylizations with different levels of detail. Full results are givenin the supplementary and in full resolution in the project webpage: https://sloeschcke.github.io/Text-Driven-Stylization-of-Video-Objects/.",

author = "Sebastian Loeschcke and Serge Belongie and Sagie Benaim",

year = "2022",

language = "English",

note = "CVEU @ ECCV 2022 : AI for Creative Video Editing and Understanding ECCV Workshop ; Conference date: 24-03-2024",

}

RIS

TY - CONF

T1 - Text-Driven Stylization of Video Objects

AU - Loeschcke, Sebastian

AU - Belongie, Serge

AU - Benaim, Sagie

PY - 2022

Y1 - 2022

N2 - We tackle the task of stylizing video objects in an intuitive and semantic manner following a user-specified text prompt. This is a challenging taskas the resulting video must satisfy multiple properties: (1) it has to be temporally consistent and avoid jittering or similar artifacts, (2) the resultingstylization must preserve both the global semantics of the object and its finegrained details, and (3) it must adhere to the user-specified text prompt. Tothis end, our method stylizes an object in a video according to two targettexts. The first target text prompt describes the global semantics and the second target text prompt describes the local semantics. To modify the style ofan object, we harness the representational power of CLIP to get a similarity score between (1) the local target text and a set of local stylized views,and (2) a global target text and a set of stylized global views. We use a pretrained atlas decomposition network to propagate the edits in a temporallyconsistent manner. We demonstrate that our method can generate consistent style changes over time for a variety of objects and videos, that adhere to the specification of the target texts. We also show how varying thespecificity of the target texts and augmenting the texts with a set of prefixes results in stylizations with different levels of detail. Full results are givenin the supplementary and in full resolution in the project webpage: https://sloeschcke.github.io/Text-Driven-Stylization-of-Video-Objects/.

AB - We tackle the task of stylizing video objects in an intuitive and semantic manner following a user-specified text prompt. This is a challenging taskas the resulting video must satisfy multiple properties: (1) it has to be temporally consistent and avoid jittering or similar artifacts, (2) the resultingstylization must preserve both the global semantics of the object and its finegrained details, and (3) it must adhere to the user-specified text prompt. Tothis end, our method stylizes an object in a video according to two targettexts. The first target text prompt describes the global semantics and the second target text prompt describes the local semantics. To modify the style ofan object, we harness the representational power of CLIP to get a similarity score between (1) the local target text and a set of local stylized views,and (2) a global target text and a set of stylized global views. We use a pretrained atlas decomposition network to propagate the edits in a temporallyconsistent manner. We demonstrate that our method can generate consistent style changes over time for a variety of objects and videos, that adhere to the specification of the target texts. We also show how varying thespecificity of the target texts and augmenting the texts with a set of prefixes results in stylizations with different levels of detail. Full results are givenin the supplementary and in full resolution in the project webpage: https://sloeschcke.github.io/Text-Driven-Stylization-of-Video-Objects/.

M3 - Paper

T2 - CVEU @ ECCV 2022

Y2 - 24 March 2024

ER -

ID: 384566532