Introduction The electronic nose (E-nose) is a device, which mimics the mammal olfactory, that can be widely used in food quality control, environmental monitoring, human exhaled breath monitoring, and etc. It consists of gas sampling, sensor arrays, and pattern recognition[1-2].Quantum dots (QDs) are generally spherical or quasi-spherical with diameters ranging from 2 to 20 nm, with remarkable surface activity attribute to the quantum effects. Furthermore, the QDs can be processed in solution with excellent thin film properties at room temperature, compatible with various rigid/flexible substrates, which is conducive to large-scale production and low-cost manufacturing[3-4]. In general, metal oxide semiconductor quantum dots have a large potential in gas sensing, especially in E-nose.The most common pattern recognition process used in E-nose is feature extraction, dimensionality reduction, and classification. In feature extraction, some features such as the response, response/recovery time and etc. are extracted from the response curves based on the basic understanding of the gas sensing mechanism. In dimensionality reduction, the Principal Components Analysis (PCA) is often used. While in the classification, Linear Discriminant Analysis (LDA) is often used for final discrimination. In nowadays, deep learning architectures have been widely applied to fields including computer vision, speech recognition, natural language processing, and etc. However, there is little literature introduces deep learning into the E-nose area. Unlike traditional machine learning methods needing to design features manually, deep learning algorithms attempt to learn high-level hierarchical features from mass data, and jointly optimize feature extractors and classifiers that seriously decreases the burden on users[5-7]. Thus, It is believed that with the help of machine learning, the accuracy of E-nose can be highly enhanced.In this work, the E-nose consists of 6 different metal oxide semiconductors such as SnO2 quantum dots, WO3 quantum dots, In2O3 quantum dots, SnO2 hieratical structure by spray pyrolysis, NiO nanoflake, and commercial SnO2 was fabricated. While the machine learning algorithm based on an end-to-end trained combination of deep convolutional and recurrent neural networks was introduced. Five different kinds of Chinese liquors were chosen for the demonstration of the classification. The high accuracy (99.6%) was achieved by this system, which is much better than the traditional pattern recognition method in the same condition. It can also be concluded that the quantum dots sensors contributed more accuracy than others. Method The solvothermal method was employed for the synthesis of colloidal metal oxide semiconductor quantum dots. WCl6 (Aladdin,0.68g) / SnCl4·5H2O (Aladdin,0.6g) / indium acetate (Aladdin,0.292g) / NiCl2·6H2O (Aladdin,0.238g) was dissolved in oleic acid and oleylamine. Before the mixture was transferred into the Teflon-lined stainless steel autoclave, 10 mL of ethanol was added in and stirred. The reaction was kept at 180 oC for 3 h, then the WO3 nanocrystals were centrifugated and washed with toluene and ethanol(1/5, v/v). Finally, the product was dispersed in toluene and N, N-Dimethylformamide (DMF).Gas sensing materials are coated on ceramic substrates(1.0×1.5 mm) with heater by dripping. The four electrodes of the substrate were then welded to the ase to form a single gas sensor element. Annealing was used to enhance the stability of each sensor.The E-nose consisted of gas inlet, sensor array chamber, micro pump, and data acquisition card (Fig. 1). Results and Conclusions We propose a novel machine learning architecture, specifically designed for metal oxide array-based odor recognition. Our algorithm is based on an end-to-end trained combination of deep convolutional and recurrent neural networks, which leverages a 1D Resnet-like network to automatically extract multi-scale features from multi-channel time-series signals, simultaneously, a high-level semantic branch is connected to LSTM to decode the extremely complex and long-term temporal dynamics. The concatenation of local spatial features and global temporal information extremely enhances the performance of multichannel time series recognition. We also integrate the 1D Convolutional Block Attention Module (CBAM) into Resnet-like architecture to further improve network performance. In contrast, we also built a set of experimental frameworks for traditional methods. Ten typical hand-crafted features were fed to PCA for dimensionality reduction and an LDA or Support Vector Machine (SVM) was used for classification.Five different Chinese liquors: ChunGuJiu (CGJ, Class 1), BaiYunBianJianXiang (BYBJX, Class 2), BaiYunBianNongXiang (BYBNX, Class 3), SiTeJiu (STJ, Class 4) and MaoTai (MT, Class 5) have been chosen for the benchmark. In the beginning, the E-nose was stabilized for 10s, then the sample was put in the inlet of the E-nose. After sensing for the 20s, the sample was removed. Each sample has been tested for 50 times in one month. The typical response curves of the sensors array were shown in Fig. 2.The algorithm achieves high recognition rates (accuracy 99.6%) on a challenging set of 5 fine-grained Chinese liquors with severe noise and sensor drift, outperform traditional methods by a large margin (Table 1).