understanding is the organization of video data into sets of events with
associated temporal dependencies. For example, a soccer goal could be explained
using a vocabulary of events such as passing, dribbling, tackling, etc. In
describing the dependencies between events it is natural to invoke the concept
of causality, but previous attempts to perform causal reasoning in video
analysis have been limited to special cases, such as sporting events or naïve
physics, where strong domain models are available. In this talk I will describe
a novel, data-driven approach to the analysis of causality in video. The key to
our approach is the representation of low-level visual events as the output of
a multivariate point process, and the use of a nonparametric formulation of
temporal causality to group event data into interacting subsets. This grouping process
differs from standard motion segmentation methods in that it exploits the
temporal structure in video over extended time scales. We apply our method to
the analysis of complex social interactions in video. Specifically, we address
the analysis and retrieval of social games between parents and children from
unstructured video collections. This application is part of a larger effort in
using computer vision technologies to impact the detection, treatment, and
understanding of developmental disorders such as autism.
This is joint work with Karthir Prabhakar, Sangmin Oh, Ping
Wang, and Gregory Abowd