Cooling costs constitute more than half of the total data center energy expenditure. Thermal imbalance resulting in hotspot regions requires additional cooling power. To reduce it, thermal aware job scheduling is a well-known software solution that is subject to predicting correct server temperatures. Existing solutions have not explored intelligent solutions and rely only on logic based algorithms to allocate tasks that work on predefined rules. Few deep learning based solutions that are proposed, have not explored it’s alternatives and existing data modalities in data centers, resulting in inefficient models. Existing literature only proposes solutions based on unimodal tabular data. Therefore, we propose a multimodal architecture that considers different underlying data modalities in data centers to increase the efficiency of the model and predict correct server temperatures. Increasing production of data and need of storage and processing units has led to development of distributed data centers. Existing techniques are limited to individual data centers which fail to consider the data privacy restrictions that arise while dealing with distributed scenarios. Findings from our simulations affirm our proposed scheme in terms of above-mentioned objectives. We propose a federated learning architecture that efficiently deals with distributed data centers while ensuring privacy. Our simulation results show an overall increase in the efficiency of the model in comparison to existing intelligent solution. Furthermore we provide comparative results that show how our model performs better and achieves lower thermal imbalance as compared to an existing scheme.