Smoke detection plays an essential role in the wild video surveillance systems for abnormal events warning. In this paper, we introduced a dedicated neural network structure named Sniffer-Net to simultaneously extract smoke dynamic feature robustly and evaluate the smoke concentration accurately. Firstly, we utilize an improved LiteFlowNet to estimate the global optical flow from image sequence. Meanwhile, a Marr–Hildreth method is brought up and fused into this network to distinguish and eliminate occluded regions from global flow map. Then, an evaluation module based on Context-Encoder network is put forward specially to quantify smoke concentration levels. This network, following the improved LiteFlowNet, is modified through replacing the loss function and removing the multiscale scheme and trained to infer approximate smoke optical flow behind occlusion regions. Starting from the statistical view, the irregular RGB/HSV feature spaces are converted into a specific quantitative evaluation space. As a result, the whole evaluation system is responsible to transform the distribution of irregular smoke motion feature into a quantified form of representation. In turn, this transformation endows the system with a novel numerical standard for smoke concentration evaluation. Finally, an accuracy assessment method is applied to compare the results of detected smoke concentration with the human experience prior model, which feedback the accuracy and false detection rate of system algorithm. In the experiments of five smoke datasets, our proposed smoke detection approach is superior to other state-of-the-art methods, and concentration algorithm achieves the satisfactory performance of 97.3% accuracy on some specialized dataset.