SummaryI/O forwarding layer has now become a standard storage layer in today's HPC systems in order to scale current storage systems to new levels of concurrency. With the deepening of storage hierarchy, I/O requests must traverse through several types of nodes to access required data, including compute nodes, I/O nodes, and storage nodes. It becomes difficult to control the data path and apply cross‐layer I/O optimization. In this paper, we propose a well coordinated I/O stack, which coordinates the data path between compute nodes and I/O nodes for better load balancing and data locality with a job‐level I/O node mapping, and coordinates data path between I/O nodes and storage nodes for lighter I/O interference. We implement and evaluate our ideas on Tianhe‐1A by leveraging an open‐source I/O forwarding layer named IOFSL. The experimental results show that our proposals can significantly accelerate I/O performance of multiple I/O kernels and real applications.