Developing an energy‐efficient artificial sensory system is of great significance for neuroprosthesis, neurorobotics, and intelligent human–machine interfaces. Inspired by the biological perception, achieving this goal through spatiotemporal processing is viable. But some challenges, such as continuous signal coding resulting in high‐energy consumption, are yet to be solved, hindering the realization of human perception emulation. Herein, a perceptual simulation enabled by spike‐based spatiotemporal processing is demonstrated, which is analogous to the biological behavior, through an NbOx‐based oscillation neuron. The time difference between distinct inputs has a notable impact on the output spiking activity of oscillation neuron. On the basis of these features, the temporal‐related perceptions, for example, direction selectivity and sound localization are closely imitated. Unambiguous differentiation of direction or azimuth is enabled according to the output spiking numbers of oscillation neuron. Furthermore, by combining the oscillation neuron with a spiking neural network, azimuth recognition is conceptually established to mimic the human's response to auditory stimuli. Herein, the feasibility of employing spike‐based spatiotemporal processing of oscillation neurons to emulate sensory functionality, paving a highly potential way for realizing energy‐efficient artificial sensory systems, is proved.