Abstract
We propose a method, called bi-point input, for convolutional neural networks (CNNs) that handle variable-length input features (e.g., speech utterances). Feeding input features into a CNN in a mini-batch unit requires that all features in each mini-batch have the same shape. A set of variable-length features cannot be directly fed into a CNN because they commonly have different lengths. Feature segmentation is a dominant method for CNNs to handle variable-length features, where each feature is decomposed into fixed-length segments. A CNN receives one segment as an input at one time. However, a CNN can consider only the information of one segment at one time, not the entire feature. This drawback limits the amount of information available at one time and consequently results in suboptimal solutions. Our proposed method alleviates this problem by increasing the amount of information available at one time. With the proposed method, a CNN receives a pair of two segments obtained from a feature as an input at one time. Each of the two segments generally covers different time ranges and therefore has different information. We also propose various combination methods and provide a rough guidance to set a proper segment length without evaluation. We evaluate the proposed method on the spoofing detection tasks using the ASVspoof 2019 database under various conditions. The experimental results reveal that the proposed method reduces the relative equal error rate (EER) by approximately 17.2% and 43.8% on average for the logical access (LA) and physical access (PA) tasks, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.