The matrix-vector multiplication operation is the kernel of most numerical algorithms.Typically, matrix-vector multiplication algorithms exploit the sparsity of a matrix, either to reduce the time taken or the memory used. In the case of parallel implementations of numerical algorithms, the sparsity of a matrix can also be used to reduce the number of processing elements (PEs) required. For example, in Leiserson's systolic array algorithm for matrix-vector multiplication the nonzeros in the matrix are partitioned into bands and each band is fed into a different PE. Similarly, in Melhem's algorithm, the nonzeros are partitioned into stripes and each stripe is fed into a distinct PE. However, in many practical applications, such as the numerical solution of partial differential equations, a matrix is much sparser than is indicated by its bandwidth or by its stripe width. A new measure of sparsity of a matrix called staircase width is introduced. A staircase in a matrix M = (m i,j)n×n is a set of matrix positions S = {(i, j) | m i,j ≠ 0, such that no position in S is strictly to the northeast of another position in S. The staircase cover of a matrix is a partition of the positions of nonzero entries in the matrix into staircases and the staircase width of a matrix is the size of a smallest staircase cover of the matrix. The staircase width of a matrix is never larger than the bandwidth or the stripe width of a matrix. Moreover, it is significantly smaller than either for some commonly occurring types of matrices such as arrowhead matrices. Experiments with sparse matrices derived from a variety of engineering problems suggest that, in practice, the staircase width of a matrix is about half the stripe width of the matrix. An efficient algorithm to compute a minimum width staircase cover of a matrix is presented. Staircase covers are used as the basis of a new parallel algorithm to compute a matrix-vector product on a linear systolic array of PEs. This algorithm uses as many PEs as the staircase width of the matrix, and therefore, saves on the number of PEs relative to algorithms of Leiserson and Melhem. It is shown that despite using fewer PEs, the algorithm uses no more time steps than Leiserson's or Melhem's algorithm.