Deep learning, a branch of AI, makes use of specialised neural networks. Computational acceleration of high-performance deep neural network algorithms continues to necessitate efficient architecture, high memory bandwidth, parallel processing, and resources, despite decades of study on such algorithms. Excessive space requirements are a problem for DNN implementations caused by resource-intensive components like activation functions (AF) and multiply-and-accumulate (MAC) units. In addition, edge-AI applications necessitate a densely packed, power-hungry, high-throughput DNN accelerator. If we want to build a DNN accelerator that uses little power and occupies little space, we need to optimise the MAC architecture, the AF, and the network complexity so that data flows efficiently. In addition, providing functional configurability while working with restricted chip surface is a difficulty for DNN hardware designs based on ASICs. This dissertation explores the efficient and low-power VLSI architecture of DNN accelerators, addressing the hardware implementation of DNN and targeting applications with limited resources. In order to assess MAC and non-linear AF operations, we investigate and enhance the CORDIC architecture. The poor throughput is a major downside of CORDIC-based architectures, even though they are area and power efficient. Consequently, we suggest a pipelined design for CORDIC-based MAC and AF that focuses on performance. Due to the increased hardware resource consumption that comes with pipeline stages, this study investigates the mutual exclusivity of CORDIC stages and investigates in depth the accuracy variation related to the number of stages needed to achieve high throughput.