Machine learning-based techniques have proven to be effective in Internet-of-Things (IoT) network behavioral inference. Existing works developed data-driven models based on features from network packets and/or flows, but mainly in a static and ad-hoc manner, without adequately quantifying their gains versus costs. In this article, we develop a generic architecture that comprises two distinct inference modules in tandem, which begins with IoT network behavior classification followed by continuous monitoring. In contrast to prior relevant works, our generic architecture flexibly accounts for various traffic features, modeling algorithms, and inference strategies. We argue quantitative metrics are required to systematically compare and efficiently select various traffic features for IoT traffic inference. This article 1 makes three contributions: (1) For IoT behavior classification, we identify four metrics, namely, cost, accuracy, availability, and frequency, that allow us to characterize and quantify the efficacy of seven sets of packet-based and flow-based traffic features, each resulting in a specialized model. By experimenting with traffic traces of 25 IoT devices collected from our testbed, we demonstrate that specialized-view models can be superior to a single combined-view model trained on a plurality of features by accuracy and cost. We also develop an optimization problem that selects the best set of specialized models for a multi-view classification. (2) For monitoring the expected IoT behaviors, we develop a progressive system consisting of one-class clustering models (per IoT class) at three levels of granularity. We develop an outlier detection technique on top of the convex hull algorithm to form custom-shape boundaries for the one-class models. We show how progression helps with computing costs and the explainability of detecting anomalies. (3) We evaluate the efficacy of our optimally selected classifiers versus the superset of specialized classifiers by applying them to our IoT traffic traces. We demonstrate how the optimal set can reduce the processing cost by a factor of six with insignificant impacts on the classification accuracy. Also, we apply our monitoring models to a public IoT dataset of benign and attack traces and show they yield an average true-positive rate of 94% and a false-positive rate of 5%. Finally, we publicly release our data (training and testing instances of classification and monitoring tasks) and code for convex hull-based one-class models.