In computer networks, traffic measurement is a module in a network probe to measure flow-level statistics from an IP packet stream, which are the basis for network performance monitoring and malicious activity detection. This module extracts the flow IDs from incoming IP packets, classifies packets into flows, and counts the number of packets (or bytes) for each flow. It is a great challenge to measure the per-flow statistics for a high-speed network device, using only the size-limited SRAM on its line cards. Therefore, many algorithms using sublinear memory have been proposed, such as CountMin and CountSketch. However, most of previous algorithms are designed for specific measurement tasks. To obtain multiple types of statistics, people have to deploy multiple sketches, which demands more resources of a network device. It is useful to design a universal sketch that can track not only the top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$k$</tex-math> </inline-formula> largest individual flows (called heavy hitters) but also the overall traffic distribution statistics (called moments). Prior work named UnivMon successfully tackled this ambitious quest. However, it incurs large and variable per-packet processing overhead, which may result in a significant throughput bottleneck in high-rate packet stream, given that each packet requires 33 hashes and 32 memory accesses on average and many times of that in the worst case. To address this performance issue, we fundamentally redesign the solution architecture from hierarchical sampling to new progressive sampling and from CountSketch to new GenericCM, which ensure that per-packet overhead is a small constant (5 hashes and 8 memory accesses in the worst case), making it more suitable for online operations, especially for hardware pipeline implementation. This new design also makes effort to reduce memory footprint or equivalently improve measurement accuracy under the same memory. Our experiments show that our solution reduces measurement error by roughly 98.1% for second-order moment and by 91.5% for entropy, when given the same 0.2MB memory as UnivMon.