Optimized deep neural network (DNN) models and energy-efficient hardware designs are of great importance in edge-computing applications. The neural architecture search (NAS) methods are employed for DNN model optimization with mixed-bitwidth networks. To satisfy the computation requirements, mixed-bitwidth convolution accelerators are highly desired for low-power and high-throughput performance. There exist several methods to support mixed-bitwidth multiply-accumulate (MAC) operations in DNN accelerator designs. The low-bitwidth-combination (LBC) method improves the low-bitwidth throughput with a large hardware cost. The high-bitwidth-split (HBS) method minimizes the additional logic gates for configuration. However, the throughput performance in the low-bitwidth mode is poor. In this work, a bit-split-and-combination (BSC) systolic accelerator is proposed. The BSC-based MAC unit is designed to support mixed-bitwidth operations with the best overall performance. Besides, interprocessing element (PE) systolic and intra-PE paralleled dataflow not only improves throughput performance in mixed-bitwidth modes, but also saves power performance for data transmission. The proposed work is designed and synthesized in a 28-nm process. The BSC MAC unit achieves a maximum <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2.08\times $ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.75\times $ </tex-math></inline-formula> energy efficiency improvement than the HBS and LBC unit, respectively. Compared with the state-of-the-art accelerators, the proposed work also achieves excellent energy-efficient performance with 20.02, 23.55, and 30.17 TOPS/W on mixed-bitwidth VGG-16, ResNet-18, and LeNet-5 benchmarks at 0.6 V, respectively.