Driver gaze has been shown to be an excellent surrogate for driver attention in intelligent vehicles. With the recent surge of highly autonomous vehicles, driver gaze can be useful for determining the handoff time to a human driver. While there has been significant improvement in personalized driver gaze zone estimation systems, a generalized system which is invariant to different subjects, perspectives, and scales is still lacking. We take a step toward this generalized system using convolutional neural networks (CNNs). We finetune four popular CNN architectures for this task, and provide extensive comparisons of their outputs. We additionally experiment with different input image patches, and also examine how the image size affects performance. For training and testing the networks, we collect a large naturalistic driving dataset comprising of 11 long drives, driven by ten subjects in two different cars. Our best performing model achieves an accuracy of 95.18% during cross-subject testing, outperforming current state-of-the-art techniques for this task. Finally, we evaluate our best performing model on the publicly available Columbia gaze dataset comprising of images from 56 subjects with varying head pose and gaze directions. Without any training, our model successfully encodes the different gaze directions on this diverse dataset, demonstrating good generalization capabilities.