Using Large Input Sets with Hardware Performance Monitoring for Profile Based Compiler Optimizations

Date of Submission: 
September 17, 2004
Report Number: 
Report PDF: 
Traditional Profile Guided Optimization (PGO) uses program instrumentation with one or more small training input data sets to generate edge or value profiles to guide compiler optimizations. This approach has been effective in predicting branch directions for many applications. However, for optimizations that are more dependent on the performance characteristics and the accuracy of the profiles, it is not clear whether profiles generated with small input data sets can reliably predict the program behavior under different input sets. We studied the frequent execution paths, IPC, and the stall cycles breakdown of the test, train and different reference input sets of the SPEC2000Int benchmarks. Our studies indicate that small input sets are less effective in predicting the program behavior for larger input data sets. We propose to use hardware performance monitor (HPM) sampling based profiles to guide optimizations, because it can work with larger input sets and gather information on important performance events. As a proof of concept, we have implemented one type of HPM sampling based PBO. We use the dynamic call path sampled by HPM to automatically guide procedure inlining in the ORC Compiler. Our results show that this approach has much lower profiling overhead, and offers significant performance gains.