Unsupervised learning of sparse spatio-temporal receptive fields through inhibitory plasticity; A model of the mammalian early visual system
Receptive fields of V1 simple cells in mammalian visual cortex are characterised as localized, oriented bandpass filters that respond to oriented contrast edges of a specific spatial scale in a particular area of the visual fields. Olshausen and Field  showed that V1 receptive fields can be obtained with an algorithm that learns basis functions from natural image patches by maximising sparsity whilst prioritising the preservation of information. However, Olshausen & Field’s algorithm is lacking biorealism on several fronts. First, it uses static image patches, whereas real brain activity is dynamic and input is constantly changing. Second, it assumes a rate code, assigning real numbers to the activity of neurons, instead of timed events as in spike trains. Our goal is to develop a spiking network that employs event-based synaptic update rules to learn dynamic receptive fields. Input to our system will be provided by an event-based camera that transmits “spikes” whenever the brightness at a pixel crosses a threshold (Dynamic Vision Sensor (DVS), Inivation, Zurich, Switzerland) . In addition, we aim to learn receptive fields in real-time by leveraging the neuromorphic SpiNNaker platform to accelerate the spiking network simulation . We are using a balanced E/I network with spike-timing dependent plasticity at inhibitory synapses, with 800 excitatory (E) neurons and 200 inhibitory (I) ones. The first step was to parameterise the network in the asynchronous-irregular (AI) regime to ensure it is reactive to input . We achieved an AI state, even under temporally dynamic input. The next step was to expose the network to defined stimuli where the ground truth is known for the sparse basis functions that the network is supposed to learn. To this end, we generated surrogate data consisting of oriented spike fronts travelling across the receptive field at a set of predefined orientations, mimicking what the DVS sensor would output in response to moving oriented lines. We exposed the network to up to 1 hour of such oriented contrast edges, with 5 edges per second, and orientation picked randomly from one of 0, 45,90, 135, 180, 225, 270 and 315 degrees. For each spike in the E population, we computed the average reverse correlation with each input pixel at different time points. Although some cells clearly showed a preference to certain orientations and angles, no clear pattern has yet emerged that would demonstrate learning of V1-like receptive fields (Fig. 1). Further work will thus concentrate on exploring the parameter space of the network and learning rule, as well as the presentation statistics of the stimuli to support receptive field formation. Ultimately, the network will ideally learn V1-like receptive fields from long exposure to DVS recordings. Our results will hopefully aid further understanding of the mechanisms of receptive field emergence and efficient event-based vision in general.