Trilogy: Data Placement to Improve Performance and Robustness of Cloud Computing


Infrastructure as a Service, one of the most disrup- tive aspects of cloud computing, enables configuring a cluster for each application for each workload. When the workload changes, a cluster will be either underutilized (wasting resources) or unable to meet demand (incurring opportunity costs). Conse- quently, efficient cluster resizing requires proper data replication and placement. Our work reveals that coarse-grain, workload- aware replication addresses over-utilization but cannot resolve under-utilization. With fine-grain partitioning of the dataset, data replication can reduce both under- and over-utilization. In our empirical studies, compared to a na ̈ive uniform data replication a coarse-grain workload-aware replication increases throughput by 81% on a highly-skewed workload. A fine-grain scheme further reaches 166% increase. Furthermore, a surprisingly small increase in granularity is sufficient to obtain most benefits. Evalu- ations also show that maximizing the number of unique partitions per node increases robustness to tolerate workload deviation while minimizing this number reduces storage footprint.

In the 6th Workshop on Scalable Cloud Data Management, conjuncted with 2017 IEEE International Conference on Big Data (BIGDATA)