Zero-shot learning (ZSL) has been actively studied for image classification tasks to relieve the burden of annotating image labels. Interestingly, the semantic segmentation task requires more labor-intensive pixel-wise annotation, but zero-shot semantic segmentation has not attracted extensive research interest. Thus, we focus on zero-shot semantic segmentation that aims to segment unseen objects with only category-level semantic representations provided for unseen categories. In this article, we propose a novel context-aware feature generation network (CaGNet) that can synthesize context-aware pixel-wise visual features for unseen categories based on category-level semantic representations and pixel-wise contextual information. The synthesized features are used to fine-tune the classifier to enable segmenting of unseen objects. Furthermore, we extend pixel-wise feature generation and fine-tuning to patch-wise feature generation and fine-tuning, which additionally considers the interpixel relationship. Experimental results on Pascal-VOC, Pascal-context, and COCO-stuff show that our method significantly outperforms the existing zero-shot semantic segmentation methods.