We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable general purpose computing on graphics processing units. We first introduce a simple, analytical model for the silicon area usage of accelerator architectures and a workload characterization of stencil computations. We combine this characterization with a parametric execution-time model and formulate a mathematical optimization problem that seeks to maximize a common objective function of all the hardware and software parameters . The solution to this problem, therefore, “solves” the codesign problem: simultaneously choosing software–hardware parameters to optimize total performance. We validate this approach by proposing architectural variants of the NVIDIA Maxwell GTX-980 (respectively, Titan X) specifically tuned to a predetermined workload of four common 2-D stencils (Heat, Jacobi, Laplacian, and Gradient) and two 3-D ones (Heat and Laplacian). Our model predicts that performance would potentially improve by 28% (respectively, 33%) with simple tweaks to the hardware parameters, such as tuning the number of streaming multiprocessors, the number of compute cores each contains, and the size of shared memory. We also develop a number of insights about the optimal regions of the design landscape.