Generic tree ensembles (such as Random Forest, RF) rely on a substantial amount of individual models to attain desirable performance. The cost of maintaining a large ensemble could become prohibitive in applications where computing resources are stringent. In this work, a hierarchical ensemble reduction and learning framework is proposed. Experiments show our method consistently outperforms RF in terms of both accuracy and retained ensemble size. In other words, ensemble reduction is achieved with enhancement in accuracy rather than degradation. The method can be executed efficiently, up to >590× time reduction than a recent ensemble reduction work. We also developed Boolean logic encoding techniques to directly tackle multiclass problems. Moreover, our framework bridges the gap between software-based ensemble methods and hardware computing in the IoT era. We developed a novel conversion paradigm that supports the automatic deployment of >500 trees on a chip. Our proposed method reduces power consumption and overall area utilization by >21.5% and >62%, respectively, comparing with RF. The hierarchical approach provides rich opportunities to balance between the computation (training and response time), the hardware resource (memory and energy), and accuracy.