Sequential recommendation becomes a critical task in many application scenarios, since people’s online activities are increasing. In order to predict the next item that users may be interested, it is necessary to take both general and dynamic preferences of the user into account. Existing approaches typically integrate the user–item or item–item feature interactions directly without considering the dynamic changes of the user’s long-term and short-term preferences, which also limits the capability of the model. To address these issues, we propose a novel unified framework for sequential recommendation task, modeling users’ long and short-term sequential behaviors at each time step and capturing item-to-item dependencies in higher-order by hierarchical attention mechanism. The proposed model considers the dynamic long and short-term user preferences simultaneously, and a joint learning mechanism is introduced to fuse them for better recommendation. We extensively evaluate our model with several state-of-the-art methods by different validation metrics on three real-world datasets. The experimental results demonstrate the significant improvement of our approach over other compared models.