A voiceprint-based method is proposed for sensing microdroplets generated from a coaxial microfluidic device in this study. Microdroplet holds significant utility in various fields, such as drug delivery and molecular biology. Real-time sensing of droplet generation is crucial for ensuring droplet quality control. Current sensing techniques, such as high-speed vision, are hindered by cost and system complexity limitations. In our approach, voiceprint features were extracted from the sound accompanying microdroplet generation using the short-time Fourier transform (STFT). These features were employed to determine droplet generation frequency and mode transitions. Experimental validation was conducted using a coaxial capillary microfluidic device capable of generating sub-100-micron droplets via controlled flowrates of water and nitrogen gas in the inner and outer capillaries, respectively. The generation frequency from hundreds to thousands hertz were successfully detected in the experiment. Additionally, real-time detections of dripping-jetting and jetting-dripping mode transition were successfully achieved using the proposed voiceprint method. This work offers a simple, robust and cost-effective solution for sensing microdroplets generated from a microfluidic device.