Music-oriented auditory attention detection (AAD) aims at determining which instrument in polyphonic music a listener is paying attention to by analyzing the listener’s electroencephalogram (EEG). However, the existing linear models cannot effectively mimic the nonlinearity of the human brain, resulting in limited performance. Thus, a nonlinear music-oriented AAD model is proposed in this paper. Firstly, an auditory feature and a musical feature are fused to represent musical sources precisely and comprehensively. Secondly, the EEG is enhanced if music stimuli are presented in stereo. Thirdly, a neural network architecture is constructed to capture nonlinear and dynamic interactions between the EEG and auditory stimuli. Finally, the musical source most similar to the EEG in the common embedding space is identified as the attended one. Experimental results demonstrate that the proposed model outperforms all baseline models. On 1-s decision windows, it reaches accuracies of 92.6% and 81.7% under mono duo and trio stimuli, respectively. Additionally, it can be easily extended to speech-oriented AAD. This work can open up new possibilities for studies on both brain neural activity decoding and music information retrieval.