Colloquium: Automated Testing and Debugging for Data-Centric Software
Abstract: Data-intensive scalable computing (DISC) systems such as MapReduce, Google FlumeJava, and Apache Spark are commonly used today to process terabytes of data. At this scale, rare and buggy corner cases frequently show up in production, leading to a crash after running for days or, worse, silently producing corrupted output. Unfortunately, in this domain, “testing on a random” sample rarely guarantees the reliability and “printf” debugging methods are expensive.
In this talk, I will describe the insights behind techniques that make automated debugging and testing feasible for data-centric software. First, I will present BigDebug and BigSift that redesign interactive and automated debugging primitives tailored for data-centric software. I will show how we leverage ideas from systems and database research to reduce the debugging time by half and perform precise root-cause analysis in a fraction of the job execution time. Second, I will discuss BigTest that systematically explores dataflow program paths and automatically generates test data that is orders of magnitude smaller yet several times more effective in revealing critical bugs. Finally, I will conclude with a broader vision of designing productivity toolkits to support the growing needs of data-centric software in ML, AI, and data science.
Bio: Muhammad Ali Gulzar is a Ph.D. candidate at the University of California Los Angeles’s Department of Computer Science. His research brings together a unique combination of ideas from software engineering, distributed systems, and databases to accelerate the development of reliable big data applications. Gulzar's prior work has also been recognized with the 2017 Google Ph.D. fellowship award, 2018 ACM SRC gold medal, and 2016 “The Best of Vldb” award.
*To sign up for an appointment to meet with Muhammad Ali Gulzar, please pay a visit to the following link: Appointment Sign Up Sheet.
All other scheduling inquiries may be sent to < firstname.lastname@example.org >.