Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Back to Optimization : Diffusion-based Zero-Shot 3D Human Pose Estimation. / Jiang, Zhongyu ; Zhou, Zhuoran ; Li, Lei; Chai, Wenhao ; Yang, Cheng-Yen ; Hwang, Jenq-Neng.

2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2024. p. 6130-6140.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Jiang, Z, Zhou, Z, Li, L, Chai, W, Yang, C-Y & Hwang, J-N 2024, Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation. in 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, pp. 6130-6140, WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision , Waikola, Hawaii, United States, 04/01/2024. https://doi.org/10.1109/WACV57701.2024.00603

APA

Jiang, Z., Zhou, Z., Li, L., Chai, W., Yang, C-Y., & Hwang, J-N. (2024). Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation. In 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 6130-6140). IEEE. https://doi.org/10.1109/WACV57701.2024.00603

Vancouver

Jiang Z, Zhou Z, Li L, Chai W, Yang C-Y, Hwang J-N. Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation. In 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE. 2024. p. 6130-6140 https://doi.org/10.1109/WACV57701.2024.00603

Author

Jiang, Zhongyu ; Zhou, Zhuoran ; Li, Lei ; Chai, Wenhao ; Yang, Cheng-Yen ; Hwang, Jenq-Neng. / Back to Optimization : Diffusion-based Zero-Shot 3D Human Pose Estimation. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2024. pp. 6130-6140

Bibtex

@inproceedings{59068f9dbb4843db988bf762bc8417f0,
title = "Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation",
abstract = "Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge for learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE 51.4mm, without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE 40.3mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW. Our code is available here: https://github.com/ipl-uw/ZeDO-Releas",
author = "Zhongyu Jiang and Zhuoran Zhou and Lei Li and Wenhao Chai and Cheng-Yen Yang and Jenq-Neng Hwang",
year = "2024",
doi = "10.1109/WACV57701.2024.00603",
language = "English",
pages = "6130--6140",
booktitle = "2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)",
publisher = "IEEE",
note = "WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision ; Conference date: 04-01-2024 Through 08-01-2024",

}

RIS

TY - GEN

T1 - Back to Optimization

T2 - WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision

AU - Jiang, Zhongyu

AU - Zhou, Zhuoran

AU - Li, Lei

AU - Chai, Wenhao

AU - Yang, Cheng-Yen

AU - Hwang, Jenq-Neng

PY - 2024

Y1 - 2024

N2 - Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge for learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE 51.4mm, without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE 40.3mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW. Our code is available here: https://github.com/ipl-uw/ZeDO-Releas

AB - Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge for learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE 51.4mm, without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE 40.3mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW. Our code is available here: https://github.com/ipl-uw/ZeDO-Releas

U2 - 10.1109/WACV57701.2024.00603

DO - 10.1109/WACV57701.2024.00603

M3 - Article in proceedings

SP - 6130

EP - 6140

BT - 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

PB - IEEE

Y2 - 4 January 2024 through 8 January 2024

ER -

ID: 378944073