We consider the first-order system space–time formulation of the heat equation introduced by Bochev and Gunzburger (in: Bochev and Gunzburger (eds) Applied mathematical sciences, vol 166, Springer, New York, 2009), and analyzed by Führer and Karkulik (Comput Math Appl 92:27–36, 2021) and Gantner and Stevenson (ESAIM Math Model Numer Anal 55(1):283–299 2021), with solution components (u1,u2)=(u,-∇xu)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$(u_1,\ extbf{u}_2)=(u,-\ abla _\ extbf{x} u)$$\\end{document}. The corresponding operator is boundedly invertible between a Hilbert space U and a Cartesian product of L2\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$L_2$$\\end{document}-type spaces, which facilitates easy first-order system least-squares (FOSLS) discretizations. Besides L2\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$L_2$$\\end{document}-norms of ∇xu1\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\ abla _\ extbf{x} u_1$$\\end{document} and u2\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\ extbf{u}_2$$\\end{document}, the (graph) norm of U contains the L2\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$L_2$$\\end{document}-norm of ∂tu1+divxu2\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\partial _t u_1 +{{\\,\ extrm{div}\\,}}_\ extbf{x} \ extbf{u}_2$$\\end{document}. When applying standard finite elements w.r.t. simplicial partitions of the space–time cylinder, estimates of the approximation error w.r.t. the latter norm require higher-order smoothness of u2\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\ extbf{u}_2$$\\end{document}. In experiments for both uniform and adaptively refined partitions, this manifested itself in disappointingly low convergence rates for non-smooth solutions u. In this paper, we construct finite element spaces w.r.t. prismatic partitions. They come with a quasi-interpolant that satisfies a near commuting diagram in the sense that, apart from some harmless term, the aforementioned error depends exclusively on the smoothness of ∂tu1+divxu2\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\partial _t u_1 +{{\\,\ extrm{div}\\,}}_\ extbf{x} \ extbf{u}_2$$\\end{document}, i.e., of the forcing term f=(∂t-Δx)u\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$f=(\\partial _t-\\Delta _x)u$$\\end{document}. Numerical results show significantly improved convergence rates.