Radio frequency interference (RFI) identification is a key step in radio data processing. In order to efficiently process huge volumes of data produced by modern large radio telescopes, such as the Five-hundred-meter Aperture Spherical radio Telescope (FAST), exceptional balance between accuracy and performance (throughput) is required for RFI flagging algorithms. RFI-Net is a single-process RFI identification package based on deep learning technique, and has achieved a higher flagging accuracy than the classical SumThreshold method. In this paper, we present a scalable RFI flagging toolkit, which can drive parallel workflows on multi-CPU and multi-GPU clusters, with RFI-Net as its core detector. It can automatically schedule the workload and aggregate itself after errors according to the running environment. Moreover, its main components are all pluggable, and can be easily customized according to requirements. The experiments with real data of FAST showed that using eight parallel workflows, the toolkit can process sky survey data at a speed of 66.79 GB/h, which means quasi-real-time RFI flagging can be achieved considering the data rate of FAST extragalactic spectral line observations.