Environment perception plays a crucial role in enabling collaborative driving automation, which is considered to be the ground-breaking solution to tackling the safety, mobility, and sustainability challenges of contemporary transportation systems. Despite the fact that computer vision for object perception is undergoing an extraordinary evolution, single-vehicle systems' constrained receptive fields and inherent physical occlusion make it difficult for state-of-the-art perception techniques to cope with complex real-world traffic settings. Collaborative perception (CP) based on various geographically separated perception nodes was developed to break the perception bottleneck for driving automation. CP leverages vehicle-to-vehicle and vehicle-to-infrastructure communication to enable vehicles and infrastructure to combine and share information to comprehend the surrounding environment beyond the line of sight and field of view to enhance perception accuracy, lower latency, and remove perception blind spots. In this article, we highlight the need for an evolved version of the collaborative perception that should address the challenges hindering the realization of level 5 AD use cases by comprehensively studying the transition from classical perception to collaborative perception. In particular, we discuss and review perception creation at two different levels: vehicle and infrastructure. Furthermore, we also study the communication technologies and three different collaborative perception message-sharing models, their comparison analyzing the trade-off between the accuracy of the transmitted data and the communication bandwidth used for data transmission, and the challenges therein. Finally, we discuss a range of crucial challenges and future directions of collaborative perception that need to be addressed before a higher level of autonomy hits the roads.