Abstract

There have been recent advances in predicting odor characteristics using molecular structure parameters of chemicals. Although the molecular structure parameters are available for each chemical, they cannot be used for chemical mixtures. This study will elucidate a computational method of predicting human odor perception from the mass spectra of chemical mixtures such as essential oils. Furthermore, a method for obtaining similarity among odor descriptors has been proposed although the dataset contains binary values only. When the database indicates a set of odor descriptors for one sample, only binary data are available and the correlation between the similar descriptors disappears. Thus, the prediction performance degrades for not considering the similarity among the odor descriptors. Since mass spectra dataset is highly dimensional, we use auto-encoder to learn the compressed representation from the mass spectra of essential oils in its bottleneck hidden layer and then accomplishes the hierarchical clustering to create odor descriptor groups with similar odor impressions using a matrix of continuous value-based correlation coefficient as well as natural language processing. This work will help to expatiate the process of overcoming binary value problem and find out the similarity among odor descriptors using machine learning with natural language semantic representation of words. To overcome the problem of disproportionate ratio of positive and negative class for both the continuous value-based correlation coefficient and word similarity based models, we use Synthetic Minority Oversampling Technique (SMOTE). This model allows us to predict human odor perception through computer simulations by forming odor descriptors group. Accordingly, this study demonstrates the feasibility of ensembling machine learning with natural language processing and SMOTE approach for predicting odor descriptor group from mass spectra of essential oils.

Highlights

  • Out of five major human senses, olfaction and taste are responsible for chemical perception and recognition

  • The number of odor descriptor groups has a tradeoff relationship with the accuracy of the model, the prediction accuracies of true positive and true negative using clusters based on language processing (Word similarity model) were 55.78% & 46.9% respectively, when the number of odor descriptor group was 5

  • We proposed a mathematical model using machine learning with the natural language processing method (Fast-Text) to predict the human odor impression by forming odor descriptor group from mass spectra of chemical mixtures such as essential oils

Read more

Summary

Introduction

Out of five major human senses, olfaction and taste are responsible for chemical perception and recognition. This study compares two types of odor descriptor clustering methods, 1) a matrix of correlation coefficient based on continuous values and 2) word similarity matrix using natural language processing. Six layer neural network model is used in this study as shown in Fig 5 that predicts the group of odor descriptor from the mass spectrum of essential oils. We use the extracted features of mass spectrum of essential oils as input of the neural network that predicts the presence or absence of odor descriptor group as the target output. We trained the four layer neural network model by using those reduced feature vectors as input to the model to predict the existence of odor descriptor groups as shown in the middle part of (Fig 5). We fix the number of odor descriptor groups Kp3 based on the experimental results

Result of predictive model
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.