Cooking is one of the major sources of indoor particulate matter (PM), which poses significant health risks and is a severe health hazard. Current studies lack an economical and effective analytical framework for quantifying inhalable particles (PM10) and fine particulate matter (PM2.5) from residential cooking activities on a large scale under real-world scenarios. This study bridges this gap by employing computer vision (CV) technology and readily available sensors. We collected data over a month in real-world settings, including cooking videos and air quality data (indoor PM10, PM2.5, CO2, temperature, relative humidity, and outdoor PM10 and PM2.5 concentrations). To classify high-emission (pan-frying, stir-frying, deep-frying) and low-emission (stewing, steaming, boiling, non-cooking) activities, we developed and validated a robust CV model named “Cooking-I3D.” This model leverages a pre-trained Two-Stream Inflated 3D ConvNet (I3D) architecture. We then assessed the efficacy of the CV-predicted cooking method in PM characterization using a first-order multivariate autoregressive model, controlling for environmental factors. The Cooking-I3D model achieved exceptional performance, boasting an accuracy of 95 % and an Area Under the Curve (AUC) of 0.98. Our results indicate that a single 6-minute high-emission cooking event triggers a 21–25 % increase in indoor PM concentrations and a 23–24 % increase in the indoor/outdoor ratio, with relative errors in these estimates ranging from 10 to 21 %. This innovative method offers a powerful tool for long-term assessment of cooking-related indoor air pollution and facilitates precision exposure assessment in human health studies.