Laryngeal videostroboscopy is an audio-mediated imaging technique allowing the visualization of vocal fold oscillation behavior: the audio signal is used to determine the fundamental frequency F0, which represents the vocal fold oscillation frequency. Knowing F0 allows to trigger the strobe illumination unit to provide a still image or slow-motion view of the vocal fold oscillation. However, this procedure involves several hardware components, noisy audio signals, and a chain of complex, error-prone algorithms that have to be orchestrated. We hypothesize that endoscopic images suffice to determine F0 with a view towards providing an alternative, image-based approach for estimating F0 during laryngeal videoendoscopy. In this study, we show that we are able to predict the relative glottal opening state to create sample points on the glottal area waveform, an endoscopic image-derived signal capable of deriving F0. As imaging frame rates from ordinary endoscopic cameras do not fulfill the Shannon–Nyquist criterion, we solve this problem with compressed sensing. We developed and evaluated the proposed approach using high-speed videoendoscopy (HSV) to simulate different, realistic low frame rates that are similar to those used in videostroboscopy. We show that we are able to predict F0 with over 95% accuracy using at most 75 sample points of a 600 ms long footage. Using endoscopic images and our algorithm only, we showcase that we can achieve a stroboscopic effect. This shows, that our proposed method in combination with the developed algorithm may be considered in the future to be integrated into clinical videostroboscopy.