Deep learning–based video analytics demands high network bandwidth to ferry the large volume of data when deployed on the cloud. When incorporated at the edge side, only lightweight deep neural network (DNN) models are affordable due to computational constraint. In this article, a cloud–edge collaborative architecture is proposed combining edge-based inference with cloud-assisted continuous learning. Lightweight DNN models are maintained at the edge servers and continuously retrained with a more comprehensive model on the cloud to achieve high video analytics performance while reducing the amount of data transmitted between edge servers and the cloud. The proposed design faces the challenge of constraints of both computation resources at the edge servers and network bandwidth of the edge–cloud links. An accuracy gradient-based resource allocation algorithm is proposed to allocate the limited computation and network resources across different video streams to achieve the maximum overall performance. A prototype system is implemented and experiment results demonstrate the effectiveness of our system with up to 28.6% absolute mean average precision gain compared with alternative designs.