Online ride-hailing services play a crucial role in daily transportation, However, challenges persist in certain regions with limited access, and drivers encounter difficulties in receiving orders. Accurate prediction of short-term origin-destination (OD) demand is crucial for addressing these issues. This study leverages recent advancements in artificial intelligence and big data to introduce a spatiotemporal encoder-decoder network with a residual feature extractor (RF-STED) for short-term OD demand prediction in online ride-hailing services. The RF-STED model, built on deep learning models such as graph convolutional networks and convolutional long short-term memory (Conv-LSTM), includes spatiotemporal networks, encoding layers, and a residual feature extractor. The spatiotemporal network has two branches: branch one processes multi-pattern OD data using a multi-pattern temporal feature extraction module, utilizing a multi-channel Conv-LSTM to capture temporal correlations. Branch two utilizes a multi-spatial feature extraction module to convert OD pair associations into a spatial topology, extracting multi-spatial correlations. The encoding layer captures spatiotemporal dependencies, while the residual feature extractor decodes compressed vectors back into an OD graph for forecasting future demand. Experiments with a Manhattan taxi dataset in the U.S. show the RF-STED model outperforms 10 baseline models and four ablation models. The results emphasize the model’s strength and robustness in short-term OD flow prediction.