Pioneer Centre for AI Talk: Fisher Yu
Title
Towards Complete 4D Scene Understanding
Abstract
Understanding semantics and motion in dynamic 3D scenes is foundational for many robotics applications, especially
autonomous driving. I aim to achieve a complete 4D scene understanding through the lens of monocular cameras. We need not only to predict semantics, geometry, and dynamics information in the 4D scenes accurately but also to generalize to rare events and different scenes. In this talk, I will highlight our solutions and attempts to lay a solid foundation for attacking the accuracy and generalization problems. Our main insight is that the vast amount of temporal information on frames, objects, and frames in videos, a natural form of visual input, is both a challenge and an opportunity for our learning algorithms. To illustrate the insight, I will showcase our state-of-the-art models and algorithms to produce high-quality segmentation, tracking, and 3D predictions on videos. In addition, I will discuss our recent works on using video information to learn more robust models that can learn concepts in long-tail distribution and tackle continuously changing environments. I hope to convince you that complete 4D scene understanding is an exciting frontier for computer vision and it can have a great impact on technology development.
Bio
Fisher Yu is an Assistant Professor at ETH Zürich in Switzerland. He obtained his Ph.D. from Princeton University and became a postdoctoral researcher at UC Berkeley. He now leads the Visual Intelligence and Systems (VIS) group at ETH Zürich. His goal is to build perceptual systems capable of performing complex tasks in complex environments. His research is at the junction of machine learning, computer vision, and robotics. He currently works on closing the loop between vision and action. His works on image representation learning and large-scale datasets, especially dilated convolutions and the BDD100K dataset, have become essential parts of computer vision research.