Processing food waste is crucial for environmental conservation and resource recovery, but inadequate sorting can lead to inorganic waste mixing with food waste. The mixed waste stream reduces the efficiency of food waste treatment facilities, and the preliminary sorting relies heavily on manual labor. To address the challenge of a non-homogeneous food-inorganic waste stream, this study proposes a vision-based system for effective sorting. A real-life Mixed Food-Inorganic Waste (MFIW) dataset containing over 13,000 samples and four categories of inorganic waste was created. Based on the dataset analysis, a Waste detection model using Deformable Convolution v3 was employed, and the appropriate positioning and classification algorithm was chosen for optimal detection performance. The Waste detection model achieves an mAP50 of 85.21 %, and the average recalls for packages, trash bags, and animal bones exceed 94 %. Additionally, the model runs at a real-time frame rate of 33.61 FPS, highlighting its suitability for industrial applications.