Mammals exhibit remarkable capability of detecting and localizing sound sources in complex acoustic environments by using binaural cues in the spiking manner. Emulating the auditory process for sound source localization (SSL) by mammals, we propose a computational model for accurate and robust SSL under the neuromorphic spiking neural network (SNN) framework. The center of this model is a Multi-Tone Phase Coding (MTPC) scheme, which encodes the interaural time difference (ITD) between binaural pure tones into discriminative spike patterns that can be directly classified by SNNs. As such, SSL can be implemented as an event-driven task on highly efficient, neuromorphic parallel processors. We evaluate the proposed computational model on a directional audio dataset recorded from a microphone array in a realistic acoustic environment with background noise, obstruction, reflection, and other interferences. We report superior localization capability with a mean absolute error (MAE) of 1.02 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">°</sup> or 100% classification accuracy with an angle resolution of 5 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">°</sup> , which surpasses other SNN-based biologically plausible neuromorphic approaches by a relatively large margin and on par with human performance in similar tasks. This study opens up many application opportunities in human-robot interaction where energy efficiency is crucial. As a case study, we successfully deploy the proposed SSL system in a robotic platform to track the speaker and orient the robot's attention.