A block implementation of a Schur-type algorithm for the recursive least-squares problem using parallel processors is described. The parallel architecture can simultaneously process block data in one hardware clock cycle with regularly and locally connected processing elements. Such a computing structure is desirable for VLSI implementation. Moreover, a latency penalty is on the order of the block size and the filter order, which is small as compared to the previous results.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>