Speaker: James J. DiCarlo, MD, PhD
Professor of Neuroscience
Head, Department of Department of Brain and Cognitive Sciences
Investigator, McGovern Institute for Brain Research
Massachusetts Institute of Technology, Cambridge, USA
Faculty Host: Dr. Russell Epstein
Title: How does the brain solve visual object recognition?
Visual object recognition is a fundamental building block of memory and cognition, but remains a central unsolved problem in systems neuroscience, human psychophysics, and computer vision (engineering). The computational crux of visual object recognition is that the recognition system must somehow be robust to tremendous image variation produced by different “views” of each object -- the so-called, “invariance problem.” The primate brain is an example of a powerful recognition system and my laboratory aims to understand and emulate its solution to this problem. A key step in isolating and constraining the brain’s solution is to first find the patterns of neuronal activity and ways to read that neuronal activity that quantitatively express the brain’s answer to visual recognition. To that end, we have previously shown that a part of the primate ventral visual stream (inferior temporal cortex, IT) rapidly and automatically conveys neuronal population rate codes that qualitatively solve the invariance problem for vision. While this is a good start, it only weakly constrains the brain’s solution. Thus, we have recently set the bar higher -- are such codes quantitatively sufficient to explain behavioral performance? In this talk, I will show how primate systems neuroscience combined with human psychophysics reveals that some (but not all) IT population codes are sufficient to explain human performance on invariant object recognition. This stands in stark contrast to all tested codes in earlier visual areas and computer vision codes, which are all insufficient (falsified by experimental data). These results argue that these rapidly and automatically computed IT population codes are common to primate brains, and that they are the direct substrate of object recognition performance. While this progress constrains and frames the kinds of algorithms we should be searching for in the primate brain, it does not directly reveal their key principles of image encoding or the myriad key “details” of that encoding. While this remains an area of active research, I will conclude by outlining how we aim to combine our experimental results in unsupervised learning with novel computer vision technology to guide us toward discovery of the true underlying cortical algorithm.