Image Compression is one of the emerging technique of Digital System for storing, retrieving of digital media applications. The main problem of Image Compression is requiring less space for storage and computation speed. In this paper we address this problem and develop a memory-efficient high speed architecture which is implemented based on orthonormalized multi-stage Fast-DST processing unit to perform lifting operation. The proposed multi-stage transform unit performs the split, predict and the update operations by considering the odd samples which are neglected in other lifting transforms. This results in speeding up the process because of the simultaneous execution of both samples. The RTU and CTU are erected with the aid of delay elements and the lifting coefficient, which further tends to attain the optimized processing speed. To address the problem of high cost of memory, multi stage proposed DST unit are combined to build a parallel multi-stage architecture which can perform multistage parallel execution on input image at competitive hardware cost. Finally, the proposed method attains better results when they are compared with existing in terms of memory complexity, low power, low latency.