The central complex of insects contains cells, organised as a ring attractor, that encode head direction. The 'bump' of activity in the ring can be updated by idiothetic cues and external sensory information. Plasticity at the synapses between these cells and the ring neurons, that are responsible for bringing sensory information into the central complex, has been proposed to form a mapping between visual cues and the heading estimate which allows for more accurate tracking of the current heading, than if only idiothetic information were used. In Drosophila, ring neurons have well characterised non-linear receptive fields. In this work we produce synthetic versions of these visual receptive fields using a combination of excitatory inputs and mutual inhibition between ring neurons. We use these receptive fields to bring visual information into a spiking neural network model of the insect central complex based on the recently published Drosophila connectome. Previous modelling work has focused on how this circuit functions as a ring attractor using the same type of simple visual cues commonly used experimentally. While we initially test the model on these simple stimuli, we then go on to apply the model to complex natural scenes containing multiple conflicting cues. We show that this simple visual filtering provided by the ring neurons is sufficient to form a mapping between heading and visual features and maintain the heading estimate in the absence of angular velocity input. The network is successful at tracking heading even when presented with videos of natural scenes containing conflicting information from environmental changes and translation of the camera.