With the continuous development of China's society and economy, urban traffic problems have become increasingly serious. Analyzing the data gnerated by the city will help us better solve the urban traffic problem. This paper defines a key road in the city, and installs data acquisition equipment on these roads. By understanding the traffic conditions of key roads, we can understand the traffic conditions of other roads in the city. In this way, only a small number of roads need to be monitored in the city, which can reduce the workload of urban data analysis, greatly improve the efficiency of data analysis and reduce the cost of urban construction. This paper combines the common traffic flow prediction scenarios in traffic problems to verify the importance of these key roads. The GPS data of taxis in Changchun City are mainly used for statistical analysis of urban road traffic flow. The main work includes the following aspects: [1] Firstly, the parallel processing method is used to match the taxi GPS data, which greatly improves the matching efficiency. The POI interest point data, road network data and ground induction coil data of Changchun City are crawled. The road flow data is obtained according to GPS data statistics. Grid the city. Road matching is carried out on the ground induction coil data. It is considered that the roads with ground induction coils are the key roads selected by human experience, and these roads are defined as the initialized key roads. Secondly, using the road network map data, the embedded coding features of the road are obtained through the graph embedding series models in the graph neural network method. The embedded features and road attributes are used to cluster the roads, and the key roads based on graph neural network are selected according to the clustering results. Then, the road flow prediction model is mainly divided into full data model and sparse data model. The full data mode uses all road history data, and the sparse data mode uses key road history data. The feature engineering of sparse data model is constructed, which mainly includes five parts: Road inherent attribute characteristics, road network diagram relationship characteristics, POI interest point characteristics, graph embedding characteristics and key road flow characteristics. A variety of machine learning and deep learning methods are tried to construct the road flow prediction model. Finally, the total cost evaluation standard of key road selection results is defined, which is composed of time cost, data cost and precision cost. [2]The time cost is the time taken for model training, the data cost is the proportion of the number of key roads to the total number of roads, and the accuracy cost is the accuracy of the road flow prediction model. The total cost experimental results of full data mode and sparse data mode are compared to verify the feasibility of sparse data mode in display scene. Comparing the key roads of initialization mode selection with the key roads of graph neural network mode selection, it is verified that the key roads of graph neural network mode selection have lower total cost, and can reduce or optimize the existing key roads in the existing cities, so as to achieve the purpose of reducing the cost of urban construction. This paper establishes a traffic prediction model to calculate the flow of other urban roads according to the sparse data of key roads. By reducing the number of monitoring equipment to reduce the cost of urban construction, we can also calculate the flow information of roads without monitoring equipment in the past. Since only road network map data and road attribute data are required for the selection of key roads, this method can also be used for auxiliary design in the planning of new urban roads, so as to reduce the workload and avoid the error of manual operation.