A fully event-based image processing pipeline containing neuromorphic vision sensors and spiking neural network has the potential to achieve high throughput, low latency and high dynamic range vision processing. In this work, we present an end-to-end SNN unsupervised learning inference framework to achieve near-real-time processing performance. The design uses fully event-driven operations that significantly improve learning and inference speed: over 100× increase of inference throughput on CPU and near-real-time inference on GPU for neuromorphic vision sensors can be achieved. The event-driven processing method supports unsupervised spike-timing-dependent plasticity learning of convolutional SNN. When labels are limited, it achieves higher accuracy than supervised training approaches. In addition, the proposed method improves robustness for low-precision SNN as it reduces spiking activity distortion and achieves higher learning accuracy than regular discrete-time simulated low-precision networks.