Sorting out plastic waste (PW) from municipal solid waste (MSW) by material type is crucial for reutilization and pollution reduction. However, current automatic separation methods are costly and inefficient, necessitating an advanced sorting process to ensure high feedstock purity. This study introduces a Swin Transformer-based model for effectively detecting PW in real-world MSW streams, leveraging both morphological and material properties. And, a dataset comprising 3560 optical images and infrared spectra data was created to support this task. This vision-based system can localize and classify PW into five categories: polypropylene (PP), polyethylene (PE), polyethylene terephthalate (PET), polyvinyl chloride (PVC), and polystyrene (PS). Performance evaluations reveal an accuracy rate of 99.75% and a mean Average Precision (mAP50) exceeding 91%. Compared to popular convolutional neural network (CNN)-based models, this well-trained Swin Transformer-based model offers enhanced convenience and performance in five-category PW detection task, maintaining a mAP50 over 80% in the real-life deployment. The model's effectiveness is further supported by visualization of detection results on MSW streams and principal component analysis of classification scores. These results demonstrate the system's significant effectiveness in both lab-scale and real-life conditions, aligning with global regulations and strategies that promote innovative technologies for plastic recycling, thereby contributing to the development of a sustainable circular economy.