COBRA: A Framework for Continuous Profiling and Binary Re-Adaptation

Date of Submission: 
May 9, 2008
Report Number: 
08-016
Report PDF: 
Abstract: 

Dynamic optimizers have shown to improve performance and power efficiency of single-threaded applications. Multithreaded applications running on CMP, SMP and cc-NUMA systems also exhibit opportunities for dynamic binary optimization. Existing dynamic optimizers lack efficient monitoring schemes for multiple threads to support appropriate thread specific or system-wide optimization for a collective behavior of multiple threads since they are designed primarily for single-threaded programs. Monitoring and collecting profiles from multiple threads expose optimization opportunities not only for single core, but also for multi-core systems that include interconnection networks and the cache coherent protocol. Detecting global phases of multithreaded programs and determining appropriate optimizations by considering the interaction between threads such as coherent misses are some features of the dynamic binary optimizer presented in this thesis when compared to the prior dynamic optimizers for single threaded programs.

This thesis presents COBRA (Continuous Binary Re-Adaptation), a dynamic binary optimization framework, for single-threaded and multithreaded applications. It includes components for collective monitoring and dynamic profiling, profile and trace management, code optimization and code deployment. The monitoring component collects the hot branches and performance information from multiple working threads with the support of OS and the hardware performance monitors. It sends data to the dynamic profiler. The dynamic profiler accumulates performance bottleneck profiles such as cache miss information along with hot branch traces. Optimizer generates new optimized binary traces and stored them in the code cache. Profiler and optimizer closely interact with each other in order to optimize for more effective code layout and fewer data cache miss stalls. The continuous profiling component only monitors the performance behavior of optimized binary traces and generates the feedback information to determine the efficiency of optimizations for guiding continuous re-optimization. It is currently implemented on Itanium 2 based CMP, SMP and cc-NUMA systems.

This thesis proposes a new phase detection scheme and hardware support, especially for dynamic optimizations, that effectively identifies and accurately predicts program phases by exploiting program control flow information. This scheme could not only be applied on single-threaded programs, but also more efficiently applied on multithreaded programs. Our proposed phase detection scheme effectively identifies dynamic intervals that are contiguous variable-length intervals aligned with dynamic code regions that show distinct single and parallel program phase behavior.

Two efficient phase-aware runtime program monitoring schemes are implemented on our COBRA framework. The sampled Basic Block...
[NOTE - Abstract continues in actual report]