When: Friday October 13, 12:30 lunch 12:45 talk
Where: Keller 2-204C
Speaker: Kwangsung Oh
Title: Data Placement in a Multiple Cloud Environment
Today, people’s everyday life is getting tightly coupled with Internet services (Internet applications). As of 2017, for example, 100 million people enjoy watching Netflix to spend their free time, 150 million people find places to stay while they are traveling through AirBnB, and 40 million people rely on Uber to find a car to move around. Many of these Internet applications are interactive and their users are geographically distributed on the globe. Providing low user-perceived latency and higher service (and data) availability is a mission critical goal for them because failure in doing so can significantly affect their revenue. To achieve that goal, many applications store (replicate) users' data in multiple geo-distributed data centers of public cloud providers such as Amazon, Microsoft, and Google to make data close to users. Cloud providers typically offer many storage services with different characteristics e.g., performance, durability, and cost. Thus, applications can have numerous storage choices both in terms of the storage services and the locations of data centers based on needs. Exploiting such diverse storage options, however, bring significant complexities to applications because each option has different interfaces, data models, pricing policies and geographical locations. These various storage options allow applications to trade off among metrics e.g., performance, availability, monetary cost and so on. To maximize the benefits of these storage options, applications must answer the question on "where are the best storage options (which storage services and which data centers) for storing data?". Answering this question, however, is challenging because the answer may be different for each application based on applications' goals e.g., SLA, consistency model, degree of fault tolerance and so on. Adding to the challenges, dynamics from cloud environment e.g., network outage, and applications e.g., users' access pattern, make it further complicated for applications to answer the question. In this talk, I will present answers to these questions-how to exploit diverse storage options easily, where to store, and how to handle dynamics-by building two systems: a policy-driven geo-distributed cloud storage system called Wiera; and an automated multi-tiered geo-distributed data placement system called TripS, to address those challenges.