With the wide application of big data technology, a large number of data geographically stored in data centers across various regions are generated everyday, waiting to be analyzed by big data tasks. Examples of such data analysis tasks include weather prediction and intelligent healthcare applications. Clouds are being used by more and more enterprises due to their nearly infinite resources, ease of scaling, and other characteristics. Organizations often rent StaaS (Storage-as-a-Service) storage products offered by cloud providers, such as OSS (Object Storage Service), for massive data storage, while building a big data cluster in cloud environments to analyze the collected data from various regions. However, when the data to be analyzed are not located in the rented cluster, how to efficiently and economically process the distributed input data stored in clouds becomes an urgent problem to be solved. A simple approach is to only cache frequently accessed data rather than all data from other regions into the cluster to reduce total traffic costs. However, it is generally very hard to predict future data access curves. Thus, a rash caching decision may incur more costs. To address this problem, in this paper we propose an online algorithm for guiding cloud users to make cost-effective caching decisions properly, while not requiring any future information. We prove theoretically that the competitive ratio of our online algorithm is less than 2. Finally we verify the effectiveness of our proposed algorithm through extensive experiments based on the real price of Alibaba’s public IaaS cloud products using both real-world Yahoo S2 data and synthesized datasets.