Automatic classification of stellar spectra is an important research component of astronomical data processing and is the basis for studying stellar evolution and parameter measurements. As a rare kind of stellar spectra, carbon star spectra put forward more efficient and accurate requirements for classification methods. The traditional manual classification methods have the disadvantages of slow speed and low accuracy, which can no longer meet the practical needs of automatic classification of massive stellar spectra, especially low signal-to-noise ratio stellar spectra, and machine learning algorithms have been widely applied to stellar spectral classification. A distinctive feature of stellar spectra is high data dimensionality, and dimensionality reduction can not only realize feature extraction, but also reduce the computational effort, which is the first task of spectral classification. Traditional linear dimensionality reduction methods such as principal component analysis reduce the spectra only based on the variance, and different types of spectra will appear crossover after projection into the low-dimensional feature space, while streamwise learning can produce excellent classification boundaries, which will avoid overlap and facilitate subsequent classification. In view of the high dimensionality of spectral data, we investigate the distribution of spectral data in high-dimensional space and the principle of dimensionality reduction of high-dimensional linear data by stream shape learning, compare the effect of two-dimensionality reduction methods, t - SNE and principal component analysis, on spectral data, and finally analyze the experimental results and compare and validate them using various machine learning classifiers. The algorithm is implemented using Python language and Scikit - learn third-party library to perform experiments on 1000 low signal-to-noise carbon star spectra from LAMOST, and finally achieve high accuracy automatic processing and classification of the spectral data. The experimental results show that for the dimensionality reduction processing of spectral data, the t - SNE method based on stream shape learning can recover the low-dimensional stream shape structure in the high-dimensional spectral data, and after feature extraction, satisfactory classification accuracy can be achieved on the test dataset using a machine learning classifier.
Read full abstract