Background: Quality-of-life metrics are increasingly important for oncological patients alongside traditional endpoints like mortality and disease progression. Statistical tools such as Win Ratio, Win Odds, and Net Benefit prioritize clinically significant outcomes using composite endpoints. In randomized trials, Win Statistics provide fair comparisons between treatment and control groups. However, their use in observational studies is complicated by confounding variables. Propensity score (PS) matching mitigates confounding variables but may reduce the sample size, affecting the power of win statistics analyses. Alternatively, PS matching can stratify samples, preserving the sample size. This study aims to assess the long-term impact of these methods on decision making, particularly in colorectal cancer patients. Methods: A motivating example involves a cohort of patients from the ReSARCh observational study (2016-2021) with locally advanced adenocarcinoma of the rectum, situated up to 12 cm from the anal verge. These patients underwent either a watch-and-wait approach (WW) or trans-anal local excision (LE). Win statistics compared the effects of WW and LE on a composite outcome (overall survival, recurrence, presence of ostomy, and rectum excision). For matched win statistics, we used robust inference techniques proposed by Matsouaka et al. (2022), and for stratified win statistics, we applied the method proposed by Dong et al. (2018). A simulation study assessed the coverage probability of matched and stratified win statistics in balanced and unbalanced groups, calculating how often the confidence intervals included the true values of WR, NB, and WO across 1000 simulations. Results: The results suggest a better efficacy of the LE approach when considering efficacy outcomes alone (WR: 0.47 (0.01 to 1.14); NB: -0.16 (-0.34 to 0.02); and WO: 0.73 (0.5 to 1.05)). However, when QoL outcomes are included in the analyses, the estimates are closer to 1 (WR: 0.87 (0.06 to 2.06); WO: 0.93 (0.61 to 1.4)) and to 0 (NB: -0.04 (-0.25 to 0.17)), indicating a negative impact of the treatment effect of LE regarding the presence of ostomy and the excision of the rectum. Moreover, based on the simulation study, our findings underscore the superior performance of matched compared to stratified win statistics in terms of coverage probability (matched WR: 97% vs. stratified WR: 33.3% in a high-imbalance setting; matched WR: 98% vs. stratified WR: 34.4% in a medium-imbalance setting; and matched WR: 99.2% vs. stratified WR: 37.4% in a low-imbalance setting). Conclusions: In conclusion, our study sheds light on the interpretation of the results of win statistics in terms of statistical significance, providing insights into the application of pairwise comparison in observational settings, promoting its use to improve outcomes for cancer patients.