Fuzzy Neural Networks (FNN) have the ability of decision-making based on constructing semi-ellipsoidal clusters in the input space as the antecedent parts of their fuzzy rules. To determine the output value for each input instance, FNNs consider its membership degree to different sub-regions of the input space. However, forming such meaningful sub-regions is not possible in all applications due to the nonlinear interactions among input variables and their low information gain. Indeed, the samples could be distributed on a manifold in the input space. Therefore, to cover the input space, we need lots of rules, each representing a small region of input space. This issue decreases the generalization ability of the model along with its explainability. Consequently, to efficiently form fuzzy rules, first, it is necessary to unfold the manifold by mapping the samples to an appropriate embedding space. Next, the fuzzy rules in the form of semi-ellipsoidal regions should be constructed in this extracted feature space. Deep Fuzzy Neural Networks address this problem by representation learning through stacking multiple cascade mapping layers. In this paper, we propose a novel approach for nonlinear function approximation and time-series prediction problems, based on using the kernel trick to implicitly learn the mapping function to the new feature space. Moreover, to initialize the fuzzy rules, a KNN-based method using the kernel trick is proposed. A hierarchical Levenberg–Marquardt approach is applied to learn the model’s parameters. The performance and structure of the proposed method are studied and compared with some other relevant methods in synthetic and real-world benchmarks. Based on these experiments, the proposed method has the best performance with the most parsimonious architecture. According to these experiments, the test RMSE of the proposed method is 0.002 for Mc-Glass chaotic time-series prediction, 0.015 for a Nonlinear dynamic system identification, 0.0345 for Box–Jenkins nonlinear system identification, 0.0609 for Fuel consumption prediction of automobiles, 10.24 for Sydney stock price tracking, and 0.595 for California housing.