This paper presents a hardware architecture for 8 x 8 2D Discrete Cosine Transform (DCT) and Inverse DCT (IDCT) using Taylor-series expansion of trigonometric functions. The processing of DCT/IDCT is modified to eliminate the need for: 1) partitioning the input image into fixed-size blocks and 2) use of transpose buffer for storing intermediate results, achieving low clock cycle time compared to existing techniques. The values obtained from Taylor-series are approximated by fixed-point numbers that reduce the hardware complexity at the cost of acceptable loss in quality of the output image. Based on the fixed-point approach, 18-bit, 24-bit, and 32-bit DCT and IDCT architectures are proposed. This DCT architecture considers the implementation of all 64 coefficients and hence, it is named as DCT64. An important observation about 2D DCT is that most of the compaction energy is concentrated on the low-frequency DCT coefficients which are sufficient to reconstruct the original image. Taking this point into consideration, three DCT architectures, namely, DCT15, DCT10, and DCT6 with 15, 10, and 6 coefficients are designed which are compared with the DCT64. The percentage improvement in the area and performance utilization is obtained for DCT15, DCT10, and DCT6 against DCT64. A gradual increase in the percentage is observed from DCT15 through DCT6 with a marginal decline in the quality of the output image at each step which can be seen from the measured PSNR and SSIM values. Regarding IDCT, the proposed architecture achieved a low percentage of area utilization and similar performance to that of the DCT. The field-programmable gate array (FPGA) implementation of the proposed DCT architecture operates at a higher frequency, achieving lower values of dynamic power, slice reg, slice LUTs metrics that outperforms the processing capabilities of existing Algebraic Integer (AI) and fixed-point implementations.