The Bonsai Merkle tree (BMT) is a widely used tree structure for authentication of metadata such as encryption counters in a secure computing system. Common BMT algorithms were designed for traditional Von Neumann architectures with a software-centric implementation in mind and as such, they are predominantly recursive and sequential in nature. However, the modern heterogeneous computing platforms employing Field-Programmable Gate Array (FPGA) devices require concurrency-focused algorithms to fully utilize the versatility and parallel nature of such systems. The recursive nature of traditional BMT algorithms makes them challenging to implement in such hardware-based setups. Our goal for this work is to introduce HMT, a hardware-friendly BMT algorithm that enables the verification and update processes to function independently and provides the benefits of relaxed update while being comparable to the eager update in terms of update complexity. The methodology of HMT contributes both novel algorithmic revisions and innovative hardware techniques to implementing BMT. We mathematically demonstrate the challenges of potentially unbounded recursions in relaxed BMT updates. To solve this problem, we use a partitioned BMT caching scheme that allocates a separate write-back cache for each BMT level—thus allowing for low and fixed upper bounds for dirty evictions compared to the traditional BMT caches. Then we introduce the aforementioned hybrid BMT algorithm that is hardware-targeted, parallel, and relaxes the update depending on BMT cache hit but makes the update conditions more flexible compared to lazy update to save additional write-backs. Deploying this new algorithm, we have designed a new BMT controller with a dataflow architecture including speculative buffers and parallel write-back engines to facilitate performance-enhancing mechanisms (like multiple concurrent authentication and independent updates) that were not possible with the conventional lazy algorithm. Our empirical performance measurements on a Xilinx U200 accelerator FPGA have demonstrated that HMT can achieve up to 7× improvement in bandwidth and 4.5× reduction in latency over lazy-update BMT baseline and up to 14% faster execution in standard benchmarks compared to a state-of-the-art, eager-update BMT solution.