A Vehicular Ad-Hoc Network (VANET) helps vehicles send and receive environmental and traffic information, making it a crucial component towards fully autonomous roads. For VANETs to serve their purpose, there has to be sufficient coverage, even in less populated areas. Moreover, a lot of the safety information is time-sensitive; excessive delay in data transfer can increase the risk of fatal accidents. Unmanned Aerial Vehicles (UAVs) can be used as mobile base-stations to fill in gaps of coverage. The placement of these UAVs is crucial towards how well the system performs. We are particularly interested in the placement of mobile base-stations for a rural highway with sparse traffic, as it represents the worst-case scenario for vehicular communication. Instead of heuristic or linear programming methods for optimal placement, we use multi-agent reinforcement learning (MARL). The main benefit of MARL is that it allows the agents to learn model-free through experience. We propose a variation of the traditional Deep Independent Q-Learning. The modifications include an observation function augmented with information directly shared between neighbouring agents as well a shared policy scheme. We also implement a custom sparse highway simulator that is used for training and testing our algorithm. Our testing shows that the proposed MARL algorithm is able to learn the placement policies that produce the maximum rewards for different scenarios while adapting to the dynamic road densities along the service area. Our experiments show that our model is scalable, allowing the number of agents to increase without any modifications to the code. Finally, we show that our model can be generalized as the algorithm can be directly used and performs equally as well on an industry standard simulator. Future experiments can be performed to improve the realism and complexity of the highway models and adapting the algorithm to real-world scenarios.