This paper proposes a novel method for motion field estimation in two consecutive LiDAR scans with convolutional neural networks (CNNs) wherein object detection, point-wise motion, and object-level motion are learned hierarchically. In the input stage, a unique mapping model serves to describe the LiDAR point cloud and its motion. In the encode network, two input maps are subjected to spatial compression and merged into a correlation layer. In the decode network, the model outputs data encompassing object detection, point-wise motion, and object-level motion. Since existing ground truth datasets are not sufficiently large to train a CNN, we generate a synthetic Vehicle Motion dataset for training. The set contains scenes with several simultaneously moving 3D vehicles recorded by a simulated LiDAR. An experiment conducted on both real and synthetic datasets shows that the proposed method outperforms other state-of-the-art motion field estimation methods.