In an iSCSI based storage area network, target hosts serve concurrent I/O requests from initiators to achieve both high throughput and low latency. Existing iSCSI leverages the OS page cache to ensure data sharing and reuse. However, the non-uniform memory access (NUMA) architecture introduces another dimension of complexity, i.e., asymmetric memory access in multi-core and many-core platforms. Within a NUMA platform, an iSCSI target often dispatches an access request with a cache hit to an I/O thread remote to cached data, and thus cannot fully utilize multi-core systems. We encounter this problem in the context of ultra high-speed data transfer between two iSCSI storage systems, during which inferior NUMA remote memory access lags behind available high network bandwidth, and thereby becomes a bottleneck of the entire end-to-end data transfer path. We design a NUMA-aware cache mechanism to align cache memory with local NUMA nodes and threads, and then schedule I/O requests to those threads that are local to the data being accessed. This NUMA-aware solution results in lower access latency and higher system throughput. We implement a cache system within the Linux SCSI target framework, and evaluated it on our NUMA-based iSCSI testbed. Experimental results show the NUMA-aware cache can significantly improve the performance of iSCSI as measured by several benchmark tools and confirm its viability in data intensive applications and real-life workloads.