With the development of Internet of Things (IoT), wireless camera networks have been widely deployed owing to its low-cost of deployment and maintenance as well as flexibility. However, it is challenging for wireless camera networks to provide energy-efficient and robust video transmissions, since the multihop wireless communications have limited bandwidth and low link quality. Besides, the camera nodes are usually battery-powered. State-of-the-art coding schemes adopt complex predictive encoding methods, thus leading to high complexity and power consumption. In this paper, we propose a novel tensor-encoder (i.e., being a randomly generated mask) for energy-efficient and robust video transmissions over unreliable wireless networks. First, we adopt a novel algebraic framework, i.e., the low-tubal-rank tensor model, to capture the strong spatiotemporal correlations within video data. Secondly, we design a mask-encoder, modeled as a mask-sampling process, that dramatically reduces the transmission burden. Then, we propose an alternating minimization algorithm as the corresponding mask-decoder. Thirdly, we prove that the proposed decoder guarantees exponential convergence to the global optima. For an $n\times n\times t$ video stream with tubal-rank $r\ll n$ , the required sampling complexity is $O(nr^2t\;\log^3\;n)\ll n^2t$ and the computational complexity is $O(n^2rt\;\log^2\; n)$ . Finally, based on synthetic data, real-world data and our wireless camera network testbed, we compare the proposed scheme with existing methods and obtain high quality video transmissions at a compression ratio of 20 percent.