Abstract

In spectroscopy, matching a measured spectrum to a reference spectrum in a large database is often computationally intensive. To solve this problem, we propose a novel fast search algorithm that finds the most similar spectrum in the database. The proposed method is based on principal component transformation and provides results equivalent to the traditional full search method. To reduce the search range, hierarchical clustering is employed, which divides the spectral data into multiple clusters according to the similarity of the spectrum, allowing the search to start at the cluster closest to the input spectrum. Furthermore, a pilot search was applied in advance to further accelerate the search. Experimental results show that the proposed method requires only a small fraction of the computational complexity required by the full search, and it outperforms the previous methods.

Highlights

  • Spectroscopy techniques, such as infrared and Raman spectroscopy, are increasingly being used to measure and analyze the physical and chemical properties of materials

  • Spectral identification methods can be divided into two categories: classification methods based on machine learning (ML) and algorithms based on the similarity evaluation [3]

  • The execution speed depends on various aspects of the CPU, such as the instruction mix, pipeline structure, cache memory, and the number of cores, rendering it difficult to find an explicit relationship between the search speed and execution time

Read more

Summary

Introduction

Spectroscopy techniques, such as infrared and Raman spectroscopy, are increasingly being used to measure and analyze the physical and chemical properties of materials. There are two types of analysis methods related to this technique. The first is to identify the constituents of a given spectrum, and the second is to identify the spectrum itself by comparing it directly to other known spectra in the database [1,2]. The second type of analysis is addressed in this study. Spectral identification methods can be divided into two categories: classification methods based on machine learning (ML) and algorithms based on the similarity evaluation [3]. The first methods show good classification performance through an optimal learning model by training a given database with a ML-based algorithm. Good identification performance is expected from these methods if a sufficient number of samples in each spectrum is obtained

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.