Reference prices, developed by consumers who frequently buy their desired products or services and form psychological price expectations as a benchmark, have significant impacts on customers’ purchase behaviors and firms’ operational strategies. Therefore, to determine an appropriate pricing and ordering strategy to maximize the total discounted revenues of a retailer, we consider joint pricing and inventory management system under reference price effects in an infinite horizon. Such a system involves uncertain market turbulence and customers’ sensitivities to gains and losses (loss-averse, gain-seeking and loss-neutral). We aggregate those factors into a general value function model with only a few realistic constraints on the variables and structure parameters. A deep reinforcement learning approach based on Double Deep Q-Networks with a target network (TN-DDQN) algorithm is proposed and forms the core of the expert decision system. Two ground truth algorithms (value iteration and real-demand-response policy) and two classical RL algorithms (Double Q-learning and Q-learning) are compared with the TN-DDQN algorithm in discrete and continuous state spaces respectively. Through a sequence of experiments, we find that the retailer should not ignore the impact of current prices on future demand, and a myopic policy will cause the retailer’s profits to decrease through reference price effects. Moreover, we also find that if customers have high abilities to remember previous prices, the retailer must bring sales prices down and raise the order-up-to level accordingly. Our system with the TN-DDQN algorithm provides a new way to handle complicated behavioral science and operation problems, which can be applied in a broader field.