Dependability for Computer Systems meets Data Analytics
Abstract: We live in a data-driven world as everyone around has been telling us of late. Everything is generating data, sometimes volumes of it, from the sensors embedded in our physical spaces to the large number of machines in data centers which are being monitored for a wide variety of metrics. The question that we pose is:
Can all this data be used for improving the dependability of computing systems?
Dependability is the property that the system continues to provide its functionality despite the introduction of faults, either accidental faults (design defects, environmental effects, etc.) or maliciously introduced faults (security attacks, either external or internal). The computing systems that we target have been increasing in scale, both in terms of the number of executing elements and the amount of data that they need to process. For example, a large number of data-spewing sensors on mobile and embedded devices coupled with the large number of such devices show such increases in scale. We have been addressing the dependability challenge through large-scale data analytics in three broad domains: embedded and mobile networks, scientific computing clusters and applications, and computational genomics. In this talk, I will first give a high-level view of the dependability challenges in these three domains, how data analytics has been brought to bear on these challenges, and some of our key results.
I will then go into two recent developments: dependability in a cellular network and dependability through approximating computation. In the first development, we answer the question – can the cellular network and the smart mobile devices working together mitigate the problem of network outages or reduced data bandwidth. In the second development, we answer the question – can the limitations of human perception be leveraged to approximate certain computation and thus allow the computation to meet timing guarantees, even when executing on resource-constrained platforms. A common example of such approximation is in video processing where the human visual system is forgiving for certain kinds of inaccuracy.
I will conclude with some insights about how the power of data analytics can help us create more dependable systems.
Bio: Saurabh Bagchi is a Professor in the School of Electrical and Computer Engineering and the Department of Computer Science at Purdue University in West Lafayette, Indiana. He is the founding Director of a university-wide resiliency center at Purdue called CRISP (2017-present). He is an ACM Distinguished Scientist (2013), a Senior Member of IEEE (2007) and of ACM (2009), a Distinguished Speaker for ACM (2012), and an IMPACT Faculty Fellow at Purdue. He is the recipient of an IBM Faculty Award (2014), a Google Faculty Award (2015), and the AT&T Labs VURI Award (2016). He was elected to the IEEE Computer Society Board of Governors for the 2017-19 term.
Saurabh's research interest is in distributed systems and dependable computing. He is proudest of the 18 PhD students who have graduated from his research group and are in various stages of building wonderful careers in industry or academia. In his group, he and his students have far too much fun building and breaking real systems. Saurabh received his MS and PhD degrees from the University of Illinois, Urbana-Champaign and his BS degree from the Indian Institute of Technology Kharagpur, all in Computer Science.