The AOSpine classification divides thoracolumbar burst fractures into A3 and A4 fractures; nevertheless, past research has found inconsistent interobserver reliability in detecting those two fracture patterns. This systematic analysis aims to synthesize data on the reliability of discriminating between A3 and A4 fractures. We searched PubMed, Scopus, and the Web of Science for studies reporting the inter- and intra-observer reliability of detecting thoracolumbar AO A3 and A4 fractures using computed tomography (CT). The search spanned 2013 to 2023 and included both primarily reliability and observational comparative studies. We followed the PRISMA guidelines and used the modified COSMIN checklist to assess the studies' quality. Kappa coefficient (k) values were categorized according to Landis and Koch, from slight to excellent. Of the 396 identified studies, nine met the eligibilitycriteria; all were primarily reliability studies except one observational study. Interobserver k values for A3/A4 fracturesvaried widely among studies (0.19-86). The interobserver reliability was poor in two studies, fair in one study, moderate in four studies, and excellent in two studies. Only two studies reported intra-observer reliability, showing fair and excellent agreement. The included studies revealed significant heterogeneity in study design, sample size, and interpretation methods. Considerable variability exists in interobserver reliability for distinguishing A3 and A4 fractures from slight to excellent agreement. This variability might be attributed to methodological heterogeneity among studies, limitations of reliability analysis, or diagnostic pitfalls in differentiating between A3 and A4. Most observational studies comparing the outcome of A3 and A4 fractures do not report interobserver agreement, and this should be considered when interpreting their results.