Background: It is often challenging for physicians to decide on the duration of anticoagulation treatment for patients with venous thromboembolism (VTE), as they need to weigh the risk of recurrent thrombosis and bleeding at the same time. Several clinical models have been developed to help physicians identify patients with high risk of bleeding. However, these tools use only the baseline clinical information and are not able to incorporate the clinical conditions and events that occur over time that may influence the risk of bleeding. Therefore, capturing the patterns and relationships in the continuously changing time series of clinical data could be a more powerful approach to developing predictor models than just relying on the baseline clinical information. Nonetheless, creating predictive models from follow-up clinical information is challenging given that time series clinical data are non-uniform, high-dimensional, multivariate observations. Here, we present the first attempt at creating a model that uses the time series follow-up information to predict major bleeding over time using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) cells. Methods: 2542 patients diagnosed with VTE were enrolled in a prospective cohort study over 8 years. In addition to recording their clinical information at baseline, 6-month follow-up interviews were conducted using a standard script to monitor bleeding status and record clinical information. Major bleeding was defined by the International Society on Thrombosis and Haemostasis, with suspected bleeding events classified by an independent adjudication committee. Overall, 118 patients had major bleeding - a 4.6% incidence rate. The median and mode of the clinical variables were used to impute missing numerical and categorical values, respectively, and for patients who had no follow-up information, for whom bleeding occurred before the first follow-up, an artificial follow-up data point was generated from their corresponding baseline data. Thereafter, the data was divided into two stratified sets: 70% for training, and 30% for testing. Five supervised neural network-based machine learning models with different architectures were trained on the baseline dataset, or the follow-up dataset, or both to predict major bleeding. After training, these machine learning models were tested on the testing set and compared to the conventional clinical models, modified to make them compatible with the available predictor variables in our dataset, including the CHAP, the HAS-BLED, the VTE-BLEED, the RIETE, the ACCP, and the OBRI, which only use the baseline information. Results: Overall, the models that used the follow-up information had a higher area under the Receiver Operating Curve (AUROC) or c-statistic compared to the other models that only relied on the baseline dataset. In particular, the LSTM RNN model was able to achieve AUROC of 81.3% that is more than 10% higher compared to the best performing clinical model. We discovered that the LSTM RNN model mostly relied on features such as number of concomitant medications, years since baseline visit, use of specific antibiotics or antiplatelet agents, and presence of new hypertension to predict bleeding from the follow-up dataset. Furthermore, half of the bleeding events occurred within the first year after patients' baseline visits - a trend reflected in the predictions made by LSTM RNN model. Finally, the models that used both the baseline and the follow-up datasets showed different results depending on their architectures; that is, the simpler ensemble model achieved AUROC of 82.5% while the more complex model had AUROC of 70.8% due to overfitting. Conclusion: We have shown that using time series follow-up data can improve bleeding risk prediction in patients with VTE who are on extended anticoagulant therapy compared to just using the baseline data, and clinicians might benefit from using such an approach. Furthermore, our results indicate that LSTM RNN is a suitable architecture to model routine clinical follow-up data. Finally, we believe using time series data could improve the performance of the other clinical models that are currently based on one-time baseline measurements.