Abstract In this paper, we investigate the performance of a nonblocking packet switch having input buffers and a limited amount of buffers within the switch fabric, where contention for the output ports occurs. A novel scheduling scheme based on head of line blocking is proposed, which improves the performance significantly. For uniform random traffic, a 16 × 16 switch has an achievable throughput equal to 87.5%. We also studied the performance of the switch modules under unbalanced and bursty traffic. Examination of the switch under two delay-priority classes reveals that the achievable throughput can be increased to 91%. To build a large size switching system, a three-stage interconnection network is used, which meets the demands of large scale ATM switch design, such as (1) modularity, (2) relaxed synchronization, (3) guaranteed high performance (i.e. high throughput, low variability of delay) without requiring large internal speed-up, and (4) maintaining packet sequence integrity.