An Algorithm for Finding the Minimum Cost of Storing and Regenerating Datasets in Multiple Clouds

The proliferation of cloud computing allows users to flexibly store, re-compute or transfer large generated datasets with multiple cloud service providers. However, due to the pay-as-you-go model, the total cost of using cloud services depends on the consumption of storage, computation and bandwidth resources which are three key factors for the cost of IaaS-based cloud resources. In order to reduce the total cost for data, given cloud service providers with different pricing models on their resources, users can flexibly choose a cloud service to store a generated dataset, or delete it and choose a cloud service to regenerate it whenever reused. However, finding the minimum cost is a complicated yet unsolved problem. In this paper, we propose a novel algorithm that can calculate the minimum cost for storing and regenerating datasets in clouds, i.e., whether datasets should be stored or deleted, and furthermore where to store or to regenerate whenever they are reused. This minimum cost also achieves the best trade-off among computation, storage and bandwidth costs in multiple clouds. Comprehensive analysis and rigid theorems guarantee the theoretical soundness of the paper, and general (random) simulations conducted with popular cloud service providers' pricing models demonstrate the excellent performance of our approach.

