Abstract

AbstractPitch estimation is the task of finding the most conspicuous frequency in a complex audio signal. Many methods that use deep neural networks have significantly increased the accuracy of pitch estimation; however, their real‐time performance results were achieved on high‐performance devices. Because pitch estimation is widely used in real‐time applications on low‐power devices, we propose an efficient method for estimating pitch on edge devices. The network architecture of the proposed method uses a depth‐scaling strategy and fully leverages convolutional networks. We further introduce a channel attention mechanism to increase accuracy without increasing computational overhead. We compared the proposed model with state‐of‐the‐art (SOTA) and conventional methods using two public datasets. The experimental results show that the proposed method has a better classification accuracy than FCNF0++, which is the best performing SOTA model. Furthermore, it reduces the processing time obtained by FCNF0++ on a personal computer and two edge devices by 48% on average. These experimental results confirm that the proposed method efficiently classifies pitch on edge devices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call