Exploiting Spatio-Temporal Tradeoffs for Energy Efficient MapReduce in the Cloud

Date of Submission: 
April 7, 2010
Report Number: 
Report PDF: 
MapReduce is a distributed computing paradigm that is being widely used for building large-scale data processing applications like content indexing, data mining and log file analysis. Offered in the cloud, users can construct their own virtualized MapReduce clusters using virtual machines (VMs) managed by the cloud service provider. However, to maintain low costs for such cloud services, cloud operators are required to optimize the energy consumption of these applications. In this paper, we describe a unique spatio-temporal tradeoff for achieving energy efficiency for MapReduce jobs in such virtualized environments. The tradeoff includes efficient spatial fitting of VMs on servers to achieve high utilization of machine resources, as well as balanced temporal fitting of servers with VMs having similar runtimes to ensure that a server runs at a high utilization throughout its uptime. To study this tradeoff, we propose a set of metrics that quantify the different sources of resource wastage. We then propose VM placement algorithms that explicitly incorporate these spatio-temporal tradeoffs, by combining a recipe placement algorithm for spatial fitting with a temporal binning algorithm for time balancing. We also propose an incremental time balancing algorithm (ITB) that can improve the energy efficiency even further by transparently increasing the cluster size for MapReduce jobs, while improving their performance at the same time. Our simulation results show that our spatio-temporal placement algorithms achieve energy savings between 20-35% over existing spatially-efficient placement techniques, and within 12% of a baseline lower-bound algorithm. Further, the ITB algorithm achieves additional savings of up to 15% over the spatio-temporal algorithms by reducing job runtimes by 5-35%.