With the development of 3D technology and the increase in 3D models, 2D image-based 3D model retrieval tasks have drawn increased attention from scholars. Previous works align cross-domain features via adversarial domain alignment and semantic alignment. However, the extracted features of previous methods are disturbed by the residual domain-specific features, and the lack of labels for 3D models makes the semantic alignment challenging. Therefore, we propose disentangled feature learning associated with enhanced semantic alignment to address these problems. On one hand, the disentangled feature learning enables decoupling the twisted raw features into the isolated domain-invariant and domain-specific features, and the domain-specific features will be dropped while performing adversarial domain alignment and semantic alignment to acquire domain-invariant features. On the other hand, we mine the semantic consistency by compacting each 3D model sample and its nearest neighbors to further enhance semantic alignment for unlabeled 3D model domain. We give comprehensive experiments on two public datasets, and the results demonstrate the superiority of the proposed method. Especially on MI3DOR-2 dataset, our method outperforms the current state-of-the-art methods with gains of 2.88% for the strictest retrieval metric NN.