Traditional facial recognition methods depend on a large number of training samples due to the massive turning of synaptic weights for low-level feature extractions. In prior work, a brain-inspired model of visual recognition memory suggested that grid cells encode translation saccadic eye movement vectors between salient stimulus features. With a small training set for each recognition type, the relative positions among the selected features for each image were represented using grid and feature label cells in Hebbian learning. However, this model is suitable only for the recognition of familiar faces, objects, and scenes. The model's performance for a given face with unfamiliar facial expressions was unsatisfactory. In this study, an improved computational model via grid cells for facial recognition was proposed. Here, the initial hypothesis about stimulus identity was obtained using the histograms of oriented gradients (HOG) algorithm. The HOG descriptors effectively captured the sample edge or gradient structure features. Thus, most test samples were successfully recognized within three saccades. Moreover, the probability of a false hypothesis and the average fixations for successful recognition were reduced. Compared with other neural network models, such as convolutional neural networks and deep belief networks, the proposed method shows the best performance with only one training sample for each face. Moreover, it is robust against image occlusion and size variance or scaling. Our results may give insight for efficient recognition with small training samples based on neural networks.