With the popularity of smartphones, a large number of "phubbers" have emerged who are engrossed in their phones regardless of the situation. In response to the potential dangers that phubbers face while traveling, this paper proposes a multimodal danger perception network model and early warning system for phubbers, designed for mobile devices. This proposed model consists of surrounding environment feature extraction, user behavior feature extraction, and multimodal feature fusion and recognition modules. The environmental feature module utilizes MobileNet as the backbone network to extract environmental description features from the rear-view image of the mobile phone. The behavior feature module uses acceleration time series as observation data, maps the acceleration observation data to a two-dimensional image space through GADFs (Gramian Angular Difference Fields), and extracts behavior description features through MobileNet, while utilizing statistical feature vectors to enhance the representation capability of behavioral features. Finally, in the recognition module, the environmental and behavioral characteristics are fused to output the type of hazardous state. Experiments indicate that the accuracy of the proposed model surpasses existing methods, and it possesses the advantages of compact model size (28.36 Mb) and fast execution speed (0.08 s), making it more suitable for deployment on mobile devices. Moreover, the developed image-acceleration multimodal phubber hazard recognition network combines the behavior of mobile phone users with surrounding environmental information, effectively identifying potential hazards for phubbers.