Supervised representation learning based on the teacher-student framework can extract quality-related features for soft sensors, in which the teacher network extracts representation information for the student network as supervision information. In traditional applications, the teacher network is heavy and is difficult to train, so the teacher network is conventionally pre-trained. However, the pre-training of the teacher network is unnecessary if the training process is not complicated so that it is meaningful to joint optimize the teacher-student network. In our application, the teacher-student framework is used to extract quality-related representation information for soft sensors. The objective is to maximize the mutual information of representation information and supervision information, in which the inconsistency of distributions between observed information and supervisory information is modeled as isotropic Gaussian noise. The objective is decoupled through analysis under some approximate assumptions so that the alternative iteration method can be used to update the parameters of the model. The proposed quality-related feature extraction method is applied to soft sensors combined with a traditional just-in-time learning method. Our experiments show that the prediction performance of our representation extraction method is better than other existing representation extraction algorithms.