We propose using a new biologically inspired approach, nonlinear Hebbian learning (NHL), to implement acoustic signal recognition in noisy environments. The proposed learning processes both spectral and temporal features of input acoustic data. The spectral analysis is realized by using auditory gammatone filterbanks. The temporal dynamics is addressed by analyzing gammatone-filtered feature vectors over multiple temporal frames, which is called a spectro-temporal representation (STR). Given STR features, the exact acoustic signatures of signals of interest and the mixing property between signals of interest and noises are generally unknown. The nonlinear Hebbian learning is then employed to extract representative independent features from STRs, and to reduce their dimensionality. The extracted independent features of signals of interest are called signatures. In the meantime of learning, the synaptic weight vectors between input and output neurons are adaptively updated. These weight vectors project data into a feature subspace, in which signals of interest are selected, while noises are attenuated. Compared with linear Hebbian learning (LHL) which explores the second-order moment of data, the applied NHL involves the higher-order statistics of data. Therefore, NHL can capture representative features that are more statistically independent than LHL can. Besides, the nonlinear activation function of NHL can be chosen to refer to the implicit distribution of many acoustic sounds, and thus making the learning optimized in an aspect of mutual information. Simulation results show that the whole proposed system can more accurately recognize signals of interest than other conventional methods in severely noisy circumstances. One applicable project is detecting moving vehicles. Noise-contaminated vehicle sound is recognized while other non-vehicle sounds are rejected. When vehicle is contaminated by human vowel, bird chirp, or additive white Gaussian noise (AWGN) at SNR=0dB, the proposed system dramatically decreases the error rate over normally used acoustic feature extraction method, mel-frequency cepstral computation (MFCC), by 26%, 36.3%, and 60.3%, respectively; and, over LHL by 20%, 2.3%, and 15.3%, respectively. Another applicable project is vehicle type identification. The proposed system achieves better performance than LHL, e.g., 40% improvement when gasoline heavy wheeled car is contaminated by AWGN at SNR=5dB. More importantly, the proposed system is implemented in real-time field testing for months. The purpose is to detect vehicle with any make or model moving on the street with speed 10-35mph. The missing rate is 1-2%, when vehicle is contaminated by any surrounding noises (human conversation, animal sound, airplane, wind, etc.) at SNR=0-20dB. The false alarm rate is around 1%. To summarize, this study not only provides an efficient approach to extract representative independent features from high-dimensional data, but also offers robustness against severe noises.
Read full abstract