The prime challenge of unsupervised symmetric heterogeneous cross-domain adaptation is to train the source domain and apply the trained knowledge to the target domain. Most of the existing algorithms for unsupervised transfer learning create the subspace of the source domain features and target domain features for training purposes. It is an extensive computational process as most of the techniques require labeled source data. Many techniques also suffer from the loss of originality of features in both domains. This paper aims to consider the feature vectors of both the source and target domain for training the data based on the similarity of exemplar (feature) vectors of different instances, known as Instance Similarity Feature (ISF). The use of vectorization method for the similarity of features is proposed in this paper. The exemplar vectors are chosen randomly for the target datasets. Hence, to acquire relevant factual data in the knowledge base for training in our research, we worked to increase the domain separation error between source and target instances. To avoid the instability caused due to poor exemplar vector selection, the K-means clustering approach is followed after feature similarity, known as K-means Instance Similarity Feature (KISF). Many existing transfer learning techniques are based on the original feature set, which can cause degeneracy, hence affecting Accuracy. In order to vanquish the limitations of existing approaches, we have introduced novel optimal models with KISF and Ant Lion Optimizer (KISFA), KISF with Particle Swarm Optimization (KISFP) and KISF with Biogeography Based Optimization (KISFB). High-dimensionality can impact efficacy of the model, hence, feature selection with nature-based optimizer namely: Ant Lion Optimizer, Particle Swarm Optimization and Biogeography-Based Optimization are applied. We measure the performance of the proposed models by using Support Vector Machine, Logistic Regression, Random Forest, Naive Baye’s, K-Nearest Neighbor and Decision Tree as classifiers, and Accuracy and F1-score as fitness functions. Extensive experiments are performed on four datasets with 50 iterations. The proposed model is compared with eleven other techniques and our technique outperforms all other techniques in average Accuracy. The validation is performed on the dataset using 10-fold cross-validation. The statistical test was performed using ANOVA, proving that our technique is significantly better than other techniques.
Read full abstract