We propose a robust and highly realistic clothing modeling method to generate a 3D clothing model with visually consistent clothing style and wrinkles distribution from a single RGB image. Notably, this entire process only takes a few seconds. Our high-quality clothing results benefit from the idea of combining learning and optimization, making it highly robust. First, we use the neural networks to predict the normal map, a clothing mask, and a learning-based clothing model from input images. The predicted normal map can effectively capture high-frequency clothing deformation from image observations. Then, by introducing a normal-guided clothing fitting optimization, the normal maps are used to guide the clothing model to generate realistic wrinkles details. Finally, we utilize a clothing collar adjustment strategy to stylize clothing results using predicted clothing masks. An extended multi-view version of the clothing fitting is naturally developed, which can further improve the realism of the clothing without tedious effort. Extensive experiments have proven that our method achieves state-of-the-art clothing geometric accuracy and visual realism. More importantly, it is highly adaptable and robust to in-the-wild images. Further, our method can be easily extended to multi-view inputs to improve realism. In summary, our method can provide a low-cost and user-friendly solution to achieve realistic clothing modeling.