Smart buildings integrate users’ digitized wearables with their physical surroundings, creating a seamless and interactive user experience. This is achieved through the utilization of multiple sensors, video streaming, artificial intelligence, and edge computing. These technologies gather extensive data and provide users with a wide range of applications, such as 3D audio/video in AR/VR, localization, virtual tours, and vigilant monitoring. Nevertheless, the current AR/VR devices face limitations due to the bulkiness and discomfort of the hardware used for on-body sensing, such as headsets and specialized glasses. These components often become uncomfortable during prolonged usage, posing a challenge for creating an immersive system that combines lightweight interaction with high-quality presentation.This paper presents a comprehensive system designed to enable immersive interaction in smart buildings with a focus on lightweight solutions. The system consists of the following components: (1). A lightweight panoramic imaging framework to address the challenges related to hardware size and functionality. (2). A learning-based video transcoding cost prediction framework for efficient load balancing. (3). A layered networking architecture designed to facilitate high-quality mobile panorama live streaming. Collectively, these components offer lightweight interaction paired with enhanced presentation quality. Our experimental results demonstrate the effectiveness of the system design, showcasing its seamless operation across different times, geographical locations, and heterogeneous wireless networks.