The neonatal period of a child is considered the most crucial phase of its physical development and future health. As per the World Health Organization, India has the highest number of pre-term births [1], with over 3.5 million babies born prematurely, and up to 40% of them are babies with low birth weights, highly prone to a multitude of diseases such as Jaundice, Sepsis, Apnea, and other Metabolic disorders. Apnea is the primary concern for caretakers of neonates in intensive care units. The real-time medical data is known to be noisy and nonlinear and to address the resultant complexity in classification and prediction of diseases; there is a need for optimizing learning models to maximize predictive performance. Our study attempts to optimize neural network architectures to predict the occurrence of apneic episodes in neonates, after the first week of admission to Neonatal Intensive Care Unit (NICU). The primary contribution of this study is the formulation and description of a set of generic steps involved in selecting various model-specific, training and hyper-parametric optimization algorithms, as well as model architectures for optimal predictive performance on complex and noisy medical datasets. The data used for the study being inherently complex and noisy, Kernel Principal Component Analysis (PCA) is used to reduce dataset dimensionality for the analysis such as interpretations and visualization of the dataset. Hyper-parametric and parametric optimization, in different categories, are considered, including learning rate updater algorithms, regularization methods, activation functions, gradient descent algorithms and depth of the network, based on their performance on the validation set, to obtain a holistically optimized neural network, that best model the given complex medical dataset. Deep Neural Network Architectures such as Deep Multilayer Perceptron's, Stacked Auto-encoders and Deep Belief Networks are employed to model the dataset, and their performance is compared to the optimized neural network obtained from the parametric exploration. Further, the results are compared with Support Vector Machine (SVM), K Nearest Neighbor, Decision Tree (DT) and Random Forest (RF) algorithms. The results indicate that the optimized eight layer Multilayer Perceptron (MLP) model, with Adam Decay and Stochastic Gradient Descent (AUC 0.82) can outperform the conventional machine learning models, and perform comparably to the Deep Auto-encoder model (AUC 0.83) in predicting the presence of apnea in neonates. The study shows that an MLP model can undergo significant improvements in predictive performance, by the proposed step-wise optimization. The optimized MLP is proved to be as accurate as deep neural network models such as Deep Belief Networks and Deep Auto-encoders for noisy and nonlinear data sets, and outperform all conventional models like Support Vector Machine (SVM), Decision Tree (DT), K Nearest Neighbor and Random Forest (RF) algorithms. The generic nature of the proposed step-wise optimization provides a framework to optimize neural networks on such complex nonlinear datasets. The investigated models can help neonatologists as a diagnostic tool.