Recent breakthroughs in Neural Networks (NNs) led to significant accuracy improvements of several machine learning applications such as image classification and voice recognition. However, this accuracy improvement comes at the cost of an immense increase in computation demands. NNs became one of the most common and computationally intensive workloads in today's datacenters. To address these computational demands, Google announced in 2016 the Tensor Processing Unit (TPU), an advanced custom ASIC accelerator for NN inference. Two new TPU versions (v2 and v3) followed in 2017 and 2018 that support also training. Google TPUv3 packs an immense processing power ( <inline-formula><tex-math notation="LaTeX">$\mathrm{90TFLOPS}$</tex-math></inline-formula> per chip) in a tiny and condensed area, leading to very high on-chip power densities and thus excessive temperature. In this article, superlattice thermoelectric cooling, which is one of the emerging on-chip cooling, is considered as an advanced cooling example for Google TPU and we investigate the impact of Negative Capacitance FET (NCFET), which is one of the recent emerging technologies, on the cooling and efficiency of TPU. Through full-chip design, of the computational core of the TPU, based on <inline-formula><tex-math notation="LaTeX">$14\mathrm{nm}$</tex-math></inline-formula> Intel FinFET technology and multiphysics temperature simulations, we demonstrate that NCFET can significantly minimize the required cooling-cost. More than 4000 NCFET configurations are evaluated in order to traverse the entire design space defined by the thickness of the ferroelectric layer of NCFET, the operating voltage, cooling, and the operating frequency, in addition to all possible FinFET's configurations. Moreover, our experimental evaluation shows that by eliminating the cooling cost, NCFET delivers 2.8x higher efficiency compared to the conventional FinFET baseline.