Bioacoustic event detection is a task of identifying specific animal sounds within a biological audio recordings. Specialists only can label temporal information of animal sounds since knowledge for biological sound is demanded for labeling and they can give only few annotations due to high labor intensity caused by large duration of biological recordings. Therefore, the task is set to few-shot learning scheme and method called prototypical network effectively deal with the few-shot learning scheme by learning representation space which represent each class for given few examples. In this work, we utilized spectro-temporal receptive field, which is inspired auditory cortex which responses actively to certain spectro-temporal modulation, as a convolutional layer kernel of prototypical network. Bioacoustic events retain plentiful spectro-temporal modulation that STRF is expected to capture animal sounds effectively. Also, STRF kernels are constructed to fixed shape rather than trained that a model utilizing STRF kernels would learn representation space with less parameter. We built a model called Two Branch STRFNet (TB-STRFNet), in which STRF branch captures spectro-temporal modulation by STRF kernels and the other branch captures detailed time-frequency information which can be smashed in STRF branch. TB-STRFNet outperformed other models showing that effectiveness of auditory system inspired method.