With the increasing popularity and applications of 2.5-D integration, both chip and packaging industries are making significant progress in this direction. In advanced high-density 2.5-D packages, package redistribution layers become similar to chiplet back-end-of-line routing layers, and the gap between them scales down with pin density improvement. Chiplet-package interactions become significant and severely affect system performance and reliability. Moreover, 2.5-D integration offers opportunities to apply novel design techniques. The traditional die-by-die design approach neither carefully considers these interactions nor fully exploits the cross-boundary design opportunities. In this article, we present a holistic chiplet-package co-optimization flow for high-density 2.5-D packaging technologies with little performance overhead and zero pipeline-depth increase. Our holistic extraction can capture all parasitics from chiplets and the package and improve system performance through iterative optimizations. Both drop-in and pay-as-you-use design methodologies are implemented for agile development and quick turn-around time. To prove the effectiveness of our flow, we implement several design cases of a microcontroller system in TSMC 65-nm technology. Our design methodologies can reduce the performance gap by 85% with respect to the 2-D reference design after holistic optimizations. We demonstrate design flexibility and development cost-saving by presenting several flavors of a three chiplets system. To validate our flow in silicon, we tape-out a chip in TSMC 65-nm technology with measured data and validated functionality.