Cyber-Physical systems, as the cornerstone of smart city, has been attracting great interest from academia and industry. It aims to monitor/ control physical components via communication and computation, while ensuring effectiveness, intelligence, and security. The related research has pointed that the state perception on physical device is the prerequisite for boosting overall CPS performance. Towards this end, we present an effective deep temporal perception networks to achieve classification based state detection. Namely, we first design a multi-feature encoding network for multi-view time series representation. Concretely, on the one hand, we utilize two piecewise aggregate representation strategies to obtain the key temporal trends; on the other hand, we adopt a temporal symbolic representation strategy to capture the necessary contextual semantic correlations. Thereafter, we develop a comprehensive representation enhancement module to improve feature comprehension capability, and thus boosting the overall performance and interpretability. Corresponding comparison experiments, ablation studies, and data visualization analyses on benchmark datasets have verified the effectiveness of our model.