Most of the current, popular approaches to monitoring real driving NOx emissions are based on direct measurement. However, due to the uncertainty of sensor-based measurements, such methods cannot always be used to accurately screen out the malfunctions of an emission control system. In this paper, a random forest (RF) model which extracts information from on-board diagnostics (OBD) data streams transmitted by a remote emission management vehicle terminal (REMVT) is proposed to provide a specific emission method for the online screening of high NOx emissions. First, two particular forms of modeling, random forest and logistic regression (LR), are laid out as representatives of nonparametric models and specified linear models. These two models were trained, validated and compared using OBD data collected from three China-VI heavy-duty diesel vehicles (HDDVs). The results show that as a data-driven, highly adaptive and robust learning method, the RF model can more accurately identify an abnormal emission state. Finally, a further validation was conducted, in which another China-VI HDDV was tested in two typical states, including a fault state and a normal state. The results indicated that the RF model could clearly distinguish the out-of-control emission condition from the normal operation state. The outcome of this research verifies the feasibility of using a machine learning model to process remote OBD data on HD vehicles and to identify high emissions in the case of an in-use fleet. On this basis, more sophisticated combined models and multi-stage models could be developed.