OpenFlow switches are being deployed in SDN to enable a wide spectrum of non-traditional applications. As a promising alternative to brutal force TCAMs, FPGA-based packet classification is being actively investigated. However, none of the existing FPGA designs can achieve high performance on both search and update for large-scale rule sets. To address this issue, we propose TcbTree, an FPGA-based algorithmic scheme for packet classification. Specifically, at the algorithmic side, i) a two-stage framework consisting of heterogeneous algorithms is proposed, where most rules can be mapped into several balanced trees without rule replications, ii) for the remaining few rules, a centralized TSS (Tuple Space Search) architecture together with a real-time feedback scheme is designed to enhance the efficiency of TSS search on FPGA, and iii) a tree dilution method is designed to equalize rule distribution in trees, so that the latency of tree search can be reduced. At the hardware side, i) an efficient data structure set is designed to convert tree traversal to addressing process, which breaks the constraints of limited tree depth and imbalanced node distribution, and ii) distinct from fully pipelined designs, multiple levels of parallelism are efficiently explored with multi-core, multi-search-engine and coarse-grained pipelines herein. Experimental results using ClassBench show that, with the implementation of TcbTree on FPGA, the average classification throughputs for 1k, 10k, 32k and 100k rule sets achieve 788.8 MPPS, 404.3 MPPS, 237 MPPS and 41.8 MPPS, respectively, and the update throughput for all benchmark rule sets is above 1 MUPS.