Origin-destination demand prediction is a critical task in the field of intelligent transportation systems. However, accurately modeling the complex spatial-temporal dependencies presents significant challenges, which arises from various factors, including spatial, temporal, and external influences such as geographical features, weather conditions, and traffic incidents. Moreover, capturing multi-scale dependencies of local and global spatial dependencies, as well as short and long-term temporal dependencies, further complicates the task. To address these challenges, a novel framework called the Spatial-Temporal Memory Enhanced Multi-Level Attention Network (ST-MEN) is proposed. The framework consists of several key components. Firstly, an external attention mechanism is incorporated to efficiently process external factors into the prediction process. Secondly, a dynamic spatial feature extraction module is designed that effectively captures the spatial dependencies among nodes. By incorporating two skip-connections, this module preserves the original node information while aggregating information from other nodes. Finally, a temporal feature extraction module is proposed that captures both continuous and discrete temporal dependencies using a hierarchical memory network. In addition, multi-scale features cascade fusion is incorporated to enhance the performance of the proposed model. To evaluate the effectiveness of the proposed model, extensively experiments are conducted on two real-world datasets. The experimental results demonstrate that the ST-MEN model achieves excellent prediction accuracy, where the maximum improvement can reach to 19.1%.