Every empirical research project includes bottlenecks at various levels. In bioacoustics, one of these time-consuming bottlenecks corresponds to the step of transforming a long stream of audio into acoustic properties of specific sounds. Here, we describe a data-extraction pipeline which integrates manual annotation with Parselmouth’s powerful computational analyses. This semi-supervised method allows extracting a large volume of sound features with limited repetitive human “point and click.” We illustrate this using recently published empirical research, where we focused on vocal production learning and plasticity in pinnipeds. Faced with a species capable of imitating sounds, fully automatic methods may misclassify individuals (because of imitation), while the large number of calls make fully manual approaches suboptimal and error-prone. Focusing on early vocal development, we tested 1–3 weeks-old harbor seal pups (Phoca vitulina). Noise playbacks served to induce seal pups to shift their fundamental frequency. Pups' spontaneous calls were recorded while exposed to bandpass-filtered noise, which spanned and masked the animals’ fundamental frequency range. After a summary manual annotation of calls’ boundaries, Parselmouth identified these boundaries in the files, and automatically extracted multiple sound parameters. Based on this, we found that pups modified their vocalizations by lowering their fundamental frequency in response to noise.