Abstract
Principal Component Analysis and Shannon Entropy are some of the most widely used methods for feature extraction and selection. PCA reduces the data to a new subspace with low dimensions by calculating the eigenvectors from eigenvalues out of a covariance matrix and thereby reduces the features to a smaller number capturing the significant information. Shannon entropy is based on probability distribution to calculate the significant information content. Information gain shows the importance of a given attribute in the set of feature vectors. The paper has introduced a hybrid technique Info_PCA which captures the properties of Information gain and PCA that overall reduces the dimensionality and thereby increases the accuracy of the machine learning technique. It also demonstrates the individual implementation of Information gain for feature selection and PCA for dimensionality reduction on two different datasets collected from the UCI machine learning repository. One of the major aims is to determine the important attributes in a given set of training feature vectors to differentiate the classes. The paper has shown a comparative analysis on the classification accuracy obtained by the application of Information Gain, PCA and Info_PCA applied individually on the two different datasets for feature extraction followed by ANN classifier where the results of hybrid technique Info_PCA achieves maximum accuracy and minimum loss in comparison to other feature extraction techniques.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have