Automatic video summary generation using multiple sources

MSc thesis defence by Anders Spælling and Lauge Jepsen


The amount of amateur footage has increased rapidly over the last few years along with the high availability of fast mobile connections and the extensive spread of smartphones. Much of this footage is recorded at specific events, by different people, who later make it available for public viewing on ex.

YouTube. We argue that there is an impending demand for a way to automatically aggregate videos related to the same event into summaries. In this project we study the possibilites for automated video summary generation, by assessing video image quality, and identifying contextual properties using ex. optical flow estimation and Haar Cascade Classifiers.

Based on these contextual properties we generate video summaries using a custom algorithm. We were able successfully detect ex. public speakers, crowds, police blinker lights, and overview shots, which allowed us to generate video summaries with a solid contextual flow. Our test panel generally provided positive feedback. In this paper we show that generating video summaries is computationally feasible and that the generated summaries are comparable to human edited summaries.

Supervisors: Kim Steenstrup Pedersen

Censor: Rasmus Larsen, IMM DTU