Basic frequency (f0) extraction assumes a significant function in the handling of monophonic signals, for example, discourse and tune. It is fundamental in different ongoing applications like feeling acknowledgment, discourse/performing voice segregation, etc. A few f0 extraction techniques have been proposed throughout the long term, yet nobody calculation functions admirably for both discourse and tune. Previously an efficient method for extracting pitch from speech signals for HMM-based speech synthesis system, voicing detection and pitch estimation is performed using the mean signal obtained from Continuous Wavelet Transform (CWT) coefficients. Both objective and subjective evaluation results show that the quality of speech synthesized with the proposed pitch estimation method is much better compared with HMM-based speech synthesis systems developed using the state-of-the-art pitch extraction methods, namely, Robust Algorithm for Pitch Tracking (RAPT) and Speech Transformation and Representation using Adaptive Interpolation of weighted spectrum (STRAIGHT) employed in HTS. Zero-frequency filter (ZFF) is used to derive the locations of impulse excitation. An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds, the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. In this paper, we propose a novel methodology that can precisely appraise f0 from discourse just as tunes. To begin with, voiced/unvoiced recognition is performed utilizing a novel RNN-LSTM based methodology. At that point, each voiced edge is disintegrated into a few sub-groups. From each sub-band of a voiced casing, the applicant pitch periods are distinguished utilizing autocorrelation and non-straight activities. At last, Viterbi interpreting is utilized to shape the last pitch forms. The presentation of the proposed technique is assessed utilizing mainstream discourse (Keele, CMU-Cold), and melody (MIR-1K, Verses) information bases. The assessment results show that the proposed strategy performs similarly well for discourse and monophonic melodies and is superior to the best in class techniques. Further, the viability of proposed f0 extraction strategy is exhibited by building up an intelligent SARGAM learning apparatus. The most generally utilized f0 extraction strategies are Praat and Riveted. The proposed method increases the performance nearly 81.8% and 35.2% compared to existing methods.