Understanding the mechanisms how one message acquires more popularity than another and modeling how it gains popularity dynamically are of tremendous interest to related AI-based decision support systems. The information cascade prediction begins benefiting from the development of deep learning on graphs. However, recent studies are generally learning the representation of nodes in the graph, which may be not suitable as cascades contain all nodes in the dissemination path as a whole. Thus, we investigate whether the whole graph of cascade could be directly embedded in low dimension and how it would be effective for predicting the future popularity. Rather than learning the representation of all nodes in the cascade, we design a framework to learn the low dimension representation of each cascade graph by constructing the content and structure proximity-based high-order graph where each node refers to each cascade. By random walk and a semi-supervised language model, the embedding of the whole cascade graph can be obtained. Our results show that the proposed method can reduce the prediction error by at least 10.29%, 22.89% and 20.01% (measured by RMSPE) respectively on three real datasets over baselines. Moreover, the running time of the model training is much less than baselines.