Urban settings are dynamic, constantly changing, and presenting a wide range of surface materials with high diversity in both spatial and spectral variation. As a result, mapping urban growth, evaluating infrastructure, managing water resources, and monitoring natural land cover become more complex tasks. Urban applications have made considerable progress thanks to the abundance of VHR orbital data and the recent development of artificial intelligence strategies especially neural networks. Convolutional neural networks have the potential to significantly enhance the analysis of urban land cover by addressing the limitations of traditional techniques. U-Net is a popular neural network for land cover analysis in remote sensing images. The current research presents a CNN model employing U-Net for image semantic segmentation in urban study area using both spectral and spatial context of VHR satellite data. The proposed model is trained, validated, and tested for VHR satellite image classification into five urban classes: water, vegetation, bare soil, road, and building. The CNN semantic segmentation results are compared to maximum likelihood image classification outcomes for validation and stability evaluation. A confusion matrix is applied to the classified scenes to determine the overall accuracy, producer's and user's accuracy, and Kappa coefficient using 400 random points with their corresponding ground truth. The U-Net image semantic segmentation technique achieved an overall accuracy of 87.50% and Kappa coefficient of 0.8395 which outperforms the maximum likelihood classification method with an overall accuracy of 83.25% and Kappa coefficient of 0.7812.