This paper proposes an efficient algorithm that maximizes performance under power constraints and is applicable in the general context of traditional dynamic voltage/frequency (V/P) scaling, or core heterogeneity and emerging dynamic micro-architectural adaptation. Performance maximization in these scenarios can be essentially viewed as mapping application threads to appropriate core states that have various power/performance characteristics. Such problems are formulated as a generic 0-1 integer linear program (ILP). The proposed algorithm is an iterative heuristic-based solution. Compared with an optimal solution generated by commercial ILP solver, the proposed algorithm produces results less than 1% away from optimum on average, with more than two orders of magnitude improvement in runtime. The algorithm can be brought online for hundred-core heterogeneous systems as it scales to systems comprised of 256 cores with less than 1 ms in overhead in worst cases. The intrinsic history awareness also provides flexibility to control cost induced by switching V/F pairs, migrating threads across cores, or tuning on/off micro-architectural resources. 1 A villainous son of Poseidon in Greek mythology who forces travelers to fit into his bed by stretching their bodies or cutting off their legs (adapted from Merriam-Webster).