Convolutional neural networks (CNNs) have been widely employed for seismic fault segmentation and show more powerful performance than conventional attribute-based methods to obtain a fault map with noise-free and continuously trackable fault features. However, CNN-based methods face the potential problem of poor generalization in field seismic images and factors affecting the fault segmentation remains incompletely studied or unexplored. Moreover, the existing pixel-wise metrics, borrowed from the natural image segmentation tasks, cannot fairly or reasonably evaluate the fault segmentation results.We firstly propose to use a distance-based metric to provide a geologically more reasonable evaluation on fault interpretation. We then use the most commonly used U-net architecture as an example to study how the CNN-based fault segmentation is affected by some significant factors such as training data, all kinds of network hyperparameters, and scaling and rotation in the inference step. Experimental results show that a training dataset with more realistic reflection features and multiple sampling rates can significantly enhance the fault segmentation.Besides, a novel loss function we proposed outperforms others with notable margins. Last but not least, it is necessary to merge predictions with multiple scales and rotations in the reference step because the CNN does not preserve transformation invariance. Based on the studies, we optimally train a properly designed CNN and apply it to multiple field examples, where we obtain accurate, clean, continuous fault detections and quantitatively evaluate them with manual interpretations.