The capacity scaling laws of two overlaid networks are investigated, which are located in the same area sharing the same wireless resources with different priorities. The primary network can be regarded as an existing communication system operated in a licensed band and, therefore, is assumed to operate in an order-optimal fashion to achieve its standalone capacity scaling law. The secondary cognitive network must keep its interference to the primary network below a certain threshold while at the same time maximizing its own throughput scaling law based on cognition information. The existing scaling results for cognitive networks inherently assume multihop communication, which is a restricted coding model. By contrast, in this paper, a general coding model is considered without any specific physical layer coding assumptions. The capacity scaling exponents for both networks are analyzed when the numbers of primary nodes n, primary base stations l, which support the communication between primary nodes, and secondary nodes m increase with the relations m = n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">β</sup> , β > 1, and l = n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">γ</sup> , 0 ≤ γ ≤ 1. For the extended network model, the capacity scaling exponents are completely characterized as max {2-α/2, 1/2, γ} and max{2-α/2, 1/2} for the primary and secondary networks respectively, where α > 2 denotes the path-loss exponent. That is, the capacity scaling laws for the primary and secondary networks are represented, respectively, by n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">max{2-α/2,1/2,γ}±∈</sup> and m <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">max{2-α/2,1/2}±∈</sup> for α > 0 arbitrarily small. For the dense network model, when the primary network achieves its standalone capacity scaling exponent of 1, the secondary network is shown to achieve a scaling exponent of 1 - 1/(2β), which improves the previous scaling exponent of 1/2 achieved by multihop. For both models, it turns out that the conventional multihop approach is in general quite suboptimal.