Abstract: My research group is focused on a variety if approaches for
video analysis and synthesis. In this talk, I will focus on two of our recent
efforts. One effort aimed at robust
spatio-temporal segmentation of video and another on using motion and flow to
predict actions from video.
In the first part of the talk, I will present an efficient
and scalable technique for spatio-temporal segmentation of long video sequences
using a hierarchical graph-based algorithm. In this effort, we begin by over
segmenting a volumetric video graph into space-time regions grouped by
appearance. We then construct a “region graph” over the obtained segmentation
and iteratively repeat this process over multiple levels to create a tree of
spatio-temporal segmentations. This hierarchical approach generates high
quality segmentations, which are temporally coherent with stable region
boundaries, and allows subsequent applications to choose from varying levels of
granularity. We further improve segmentation quality by using dense optical
flow to guide temporal connections in the initial graph. I will demonstrate a
variety of examples of how this robust segmentation works, and will show
additional examples of video-retargeting that use the saliency from this
segmentation approach. (Matthias
Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with
Google Research).
In the second part of this talk, I will show that
constrained multi-agent events can be analyzed and even predicted from video.
Such analysis requires estimating the global movements of all players in the
scene at any time, and is needed for modeling and predicting how the
multi-agent play evolves over time on the field. To this end, we propose a
novel approach to detect the locations of where the play evolution will
proceed, e.g. where interesting events will occur, by tracking player positions
and movements over time. To achieve this, we extract the ground level sparse
movement of players in each time-step, and then generate a dense motion field.
Using this field we detect locations where the motion converges, implying
positions towards which the play is evolving. I will show examples of how we
have tested this approach for soccer, basketball and hockey. (Kihwan Kim,
Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa,
CVPR 2010, in collaboration with Disney Research).
Time permitting, I will show some more videos of our recent
work on video analysis and synthesis. For more information, papers, and videos,
see my website at http://prof.irfanessa.com/