Context: Due to the chronic nature of cancer, screening programs were a set of sequential decisions taken over time. Markov decision process (MDP) and partially observable Markov decision process (POMDP) models were the mathematical tools applied in studies, including sequential decision-making such as screening protocols for medical decision-making. Objectives: The main goal of this study was to investigate optimal policy for cancer screening using MDP and POMDP models. Methods: We performed a review of articles published within July 2000 to November 2022 in PubMed, Web of Science, and Scopus databases. The stopping age, the type of optimal strategy, the benefits of the optimal policy, and the relationship between age and risk threshold were extracted. Studies that did not use MDPs and POMDPs as the mathematical maximization models in cancer screening, review articles, editorials or commentaries, non-English articles, and those that did not focus on optimization were excluded. Results: From 532 articles, 6 studies met the study criteria. All studies suggested that the optimal policy was control-limit, and the cancer risk threshold was a non-decreasing function of age. Three studies specified a stopping age for cancer screening. In five studies, the optimal policies outperformed the guidelines or no screening strategy. Conclusions: Two essential factors in screening decisions were cancer risk and age, which were individual variables. The control-limit policy included these factors in decision-making for cancer screening. These policies highlighted personalized screening and showed that this type of screening could outperform cancer screening guidelines regarding economic and clinical benefits.