Convolutional neural networks (CNNs) have been widely used for seismic fault segmentation and show more powerful performance than conventional attribute-based methods to obtain a fault map with noise-free and continuously trackable fault features. However, CNN-based methods face the potential problem of poor generalization in field seismic images, and factors affecting fault segmentation remain incompletely studied or unexplored. Moreover, the existing pixel-wise metrics, borrowed from the natural image segmentation tasks, cannot fairly or reasonably evaluate the fault segmentation results. We first develop a distance-based metric to provide a geologically more reasonable evaluation on fault interpretation. We then use the most commonly used U-net architecture as an example to study how the CNN-based fault segmentation is affected by some significant factors such as training data, many kinds of network hyperparameters, and scaling and rotation in the inference step. Experimental results show that a training data set with more realistic reflection features and multiple sampling rates can enrich the data set variations in the structure and waveform signatures, thus significantly enhancing the fault segmentation. Moreover, a novel loss function we developed outperforms others with notable margins. Last, but not least, it is necessary to apply a test-time augmentation strategy by merging predictions with multiple scales and rotations in the reference step because the CNN does not preserve transformation invariance. Based on the studies, we optimally train a properly designed CNN and apply it to multiple field examples, where we obtain accurate, clean, continuous fault detections, and quantitatively evaluate them with manual interpretations.