The ability of a humanoid robot to display human-like facial expressions is crucial to the natural human–computer interaction. To fulfill this requirement for an imitative humanoid robot, XIN-REN, an automatic facial expression learning method is proposed. In this method, first, a forward kinematics model, which is designed to reflect nonlinear mapping relationships between servo displacement vectors and corresponding expression shape vectors, is converted into a linear relationships between the mechanical energy of servo displacements and the potential energy of feature points, based on the energy conservation principle. Second, an improved inverse kinematics model is established under the constraints of instantaneous similarity and movement smoothness. Finally, online expression learning is employed to determine the optimal servo displacements for transferring the facial expressions of a human performer to the robot. To illustrate the performance of the proposed method, we conduct evaluation experiments on the forward kinematics model and the inverse kinematics model, based on the data collected from the robot's random states as well as fixed procedures by animators. Further, we evaluate the facial imitation ability with different values of the weighting factor, according to three sequential indicators (space-similarity, time-similarity, and movement smoothness). Experimental results indicate that the deviations in mean shape and position do not exceed 6 pixels and 3 pixels, respectively, and the average servo displacement deviation does not exceed 0.8%. Compared with other related studies, the proposed method maintains better space–time similarity with the performer, besides ensuring smoother trajectory for multiframe sequential imitation.