To reduce the waste of fresh foods, one of the e-commerce companies in South Korea utilizes lateral transshipment in the network of online platforms and offline shops, which is called the online–offline channel system (OOCS). Even though the OOCS has achieved success in real practice, there is room for further study on this system with regard to deriving a transshipment policy. For this reason, this study aims to develop a solution approach that could derive a promising policy and analyze the impacts of transshipment in the OOCS. The main contributions are summarized as follows. First, we propose a model to deal with the proactive transshipment of perishable products in the OOCS. In particular, this is the first study that introduces the concept of the heterogeneous shelf life considering different properties of online and offline channels. Second, we develop the hybrid deep reinforcement learning (DRL) approach by combining the soft actor–critic algorithm with two novel acceleration methods. The developed method could obtain a promising policy without assumptions about demand distribution and mitigate computational burdens by reducing action spaces. On a set of experiments carried out on real-world demand data, the transshipment policy derived from the hybrid DRL approach could obtain the best profit compared to existing algorithms. Third, we examine the impacts of transshipment by differing types of demand and varying the unit transshipment cost parameter. We find that transshipment substantially reduces the outdating cost by allowing the offline channel to make good use of the old products that will be discarded in the online channel, which is new to the literature.