Spring 2019 GRASP Seminar Series: Stefano Soatto, UCLA, "Learning Optimal Representations: From the Information Bottleneck to the Dynamic Distance between Learning Tasks"
What is the function of past data that we can store in memory so that, come future data, we can best process it to solve an inference task? I will first formalize a set of desirable properties a representation should have, and derive a variational principle that is related to the Information Bottleneck of Tishby, Bialek and Perira. Unfortunately, the corresponding (IB) Lagrangian, cannot be computed, let alone optimized. I will then show that there exists a different IB Lagrangian, that relates to the model parameters, that is in principle unrelated to the first one, and is instead related to the empirical loss used when training deep networks, which can be computed and easily optimized. I will then show that the latter bounds the former, so by optimizing a function of past (training) data, we can guarantee desirable properties of the representation of future (test) data such as sufficiency, minimality, invariance, and disentanglement. That addresses the issue of how to compute an optimal representation for a given task. What if the task is not fully known ahead of time? It is common practice today to train a model on a task (say, finding cats and dogs in images), and then fine-tuning it for another (say, detecting tumors in a mammogram). Sometimes it works. Sometimes it does not. Worse, it is impossible to predict whether it will. I will introduce a new framework to compute the (asymmetric) distance between tasks, and introduce the notion of Task Accessibility, that can predict whether fine-tuning can work, regardless of how “close” two tasks are. Indeed, there are tasks that are quite close, yet it is not possible to fine-tune from one to another. This universal phenomenon of task inaccessibility is observed in biological systems (critical learning periods) as well as in neural networks, and has nothing to do with biology. Instead, it has to do with the dynamics of learning, which we are only now beginning to uncover.
Stefano Soatto is the founder and director of the UCLA Vision Lab (vision.ucla.edu). He received his Ph.D. in Control and Dynamical Systems from the California Institute of Technology in 1996; he joined UCLA in 2000 after being Associate Professor of Electrical and Biomedical Engineering at Washington University, Research Associate in Applied Sciences at Harvard University, Assistant Professor in Mathematics and Computer Science at the University of Udine, Italy, and EAP Fellow at UC Berkeley. He received his D.Ing. degree (highest honors) from the University of Padova- Italy in 1992. He is currently on leave from UCLA to serve as Director of Applied Science at AWS.