The new or future communication systems require much higher flexibility and scalability to support diverse scenarios. In this paper, a high-throughput low-density parity-check (LDPC) decoder on graphics processing unit is presented to meet the flexible and scalable requirements. A memory-reduced forward/backward approach for the check node update is proposed. Moreover, elaborate on-chip memory allocations are conducted to improve memory bandwidth. The proposed (26112, 8448) 5G LDPC decoder on RTX4090 achieves 27.6 Gbps decoding throughput with five layered iterations, while the latency is less than 1 ms. Compared with related works, the throughput speedups obtained by the presented LDPC decoder are from 1.18×\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$1.18\ imes$$\\end{document} to 12.4×\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$12.4\ imes$$\\end{document}.