Although half a decade has passed since Frank Rosenblatt’s original work on multilayer perceptrons, modern artificial neural networks are still surprisingly similar to his original ideas.
In this talk, I will question one of their most fundamental design aspects. As networks have become much deeper than had been possible or had even been imagined in the 1950s, it is no longer clear that the layer by layer connectivity pattern is a well-suited architectural choice. In the first part of the talk I will show that randomly removing layers during training can speed up the training process, make it more robust, and ultimately lead to better generalization. We refer to this process as learning with stochastic depth — as the effective depth of the networks varies for each minibatch. In the second part of the talk I will propose an alternative connectivity pattern, Dense Connectivity, which is inspired by the insights obtained from stochastic depth. Dense connectivity leads to substantial reductions in parameter sizes, faster convergence, and further improvement in generalization. Finally, I will investigate the question why deep neural networks are so well suited for natural images and provide evidence that they linearize the underlying sub-manifold into a Euclidean feature space.