Colloquia: Programming Abstractions for Data Stream Processing Systems

February 26, 2018 - 11:15am to 12:15pm
University of Pennsylvania
3-180 Keller Hall
Eric Van Wyk

Bio: Konstantinos Mamouras is a postdoctoral researcher in the Department of Computer and Information Science at the University of Pennsylvania. Before joining Penn, he was a Ph.D student at the Computer Science department of Cornell University. He finished the MSc in Advanced Computing program at Imperial College London, and the Electrical and Computer Engineering undergraduate program at the National Technical University of Athens. He is currently working on the design of programming abstractions for processing data streams. Several real-time decision making applications rely on the computation of quantitative summaries of very large streams of data. A compilation algorithm translates the high-level query into a streaming algorithm with precise guarantees for resource usage. He is also interested in program semantics and logics for program verification. In particular, this includes equational theories of programs based on the framework of Kleene Algebra with Tests.

Abstract: Modern information processing systems increasingly demand the ability to continuously process incoming streaming data in a timely and reliable manner. Data streams arise in diverse applications ranging from patient monitoring in healthcare to real-time decision-making in emerging Internet of Things (IoT) systems. In this talk, I will present my research on the design of programming abstractions for stream processing that enable guarantees of correctness and predictable performance. First, I will present StreamQRE, a declarative domain-specific language and execution engine for stream processing. StreamQRE offers strong theoretical guarantees for resource usage, and its performance on realistic workloads is shown to compare favorably against other popular streaming engines. As a case study, I will discuss the application of StreamQRE to the design space exploration of alternative algorithms for cardiac arrhythmia detection. Finally, I will introduce a type-based framework for the logical specification of distributed streaming computations that facilitates correct and efficient deployment on distributed architectures such as Apache Storm.