Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Standard
Back to Optimization : Diffusion-based Zero-Shot 3D Human Pose Estimation. / Jiang, Zhongyu ; Zhou, Zhuoran ; Li, Lei; Chai, Wenhao ; Yang, Cheng-Yen ; Hwang, Jenq-Neng.
2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2024. s. 6130-6140.Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Back to Optimization
T2 - WACV 2024 - IEEE/CVF Winter Conference on Applications of Computer Vision
AU - Jiang, Zhongyu
AU - Zhou, Zhuoran
AU - Li, Lei
AU - Chai, Wenhao
AU - Yang, Cheng-Yen
AU - Hwang, Jenq-Neng
PY - 2024
Y1 - 2024
N2 - Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge for learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE 51.4mm, without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE 40.3mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW. Our code is available here: https://github.com/ipl-uw/ZeDO-Releas
AB - Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge for learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE 51.4mm, without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE 40.3mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW. Our code is available here: https://github.com/ipl-uw/ZeDO-Releas
U2 - 10.1109/WACV57701.2024.00603
DO - 10.1109/WACV57701.2024.00603
M3 - Article in proceedings
SP - 6130
EP - 6140
BT - 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
PB - IEEE
Y2 - 4 January 2024 through 8 January 2024
ER -
ID: 378944073