Advances in deep learning and computer vision are making significant contributions to flood mapping, particularly when integrated with remotely sensed data. Although existing supervised methods, especially deep convolutional neural networks, have proved to be effective, they require intensive manual labeling of flooded pixels to train a multi-layer deep neural network that learns abstract semantic features of the input data. This research introduces a novel weakly supervised approach for pixel-wise flood mapping by leveraging multi-temporal remote sensing imagery and image processing techniques (e.g., Normalized Difference Water Index and edge detection) to create weakly labeled data. Using these weakly labeled data, a bi-temporal U-Net model is then proposed and trained for flood detection without the need for time-consuming and labor-intensive human annotations. Using floods from Hurricanes Florence and Harvey as case studies, we evaluated the performance of the proposed bi-temporal U-Net model and baseline models, such as decision tree, random forest, gradient boost, and adaptive boosting classifiers. To assess the effectiveness of our approach, we conducted a comprehensive assessment that (1) covered multiple test sites with varying degrees of urbanization, and (2) utilized both bi-temporal (i.e., pre- and post-flood) and uni-temporal (i.e., only post-flood) input. The experimental results showed that the proposed framework of weakly labeled data generation and the bi-temporal U-Net could produce near real-time urban flood maps with consistently high precision, recall, f1 score, IoU score, and overall accuracy compared with baseline machine learning algorithms.