PASS: Program Structure Aware Stratified Sampling for Statistically Selecting Instruction Traces and Simulation Points
As modeled microarchitectures become more complex and the size of benchmark program keeps increasing, simulating a complete program with various input sets is practically infeasible within a given time and computation resource budget. A common approach is to simulate only a subset of representative parts of the program selected from the complete program execution. SimPoint [1,2] and SMARTS  have shown that accurate performance estimates can be achieved with a relatively small number of instructions.
This paper proposes a novel method called Program structure Aware Stratified Sampling (PASS) for further reducing microarchitecture simulation time without losing accuracy and coverage. PASS has four major phases, consisting of building Extended Calling Context Tree (ECCT), dynamic code region analysis, program behavior profiling, and stratified sampling. ECCT is constructed to represent program calling context and repetitive behavior via dynamic instrumentation. Dynamic code region analysis identifies code regions with similar program phase behaviors. Program behavior profiling stores statistical information of program behaviors such as number of executed instructions, branch mispredictions, and cache miss associated with each code region. Based on the variability of each phase, we adaptively sample instances of instruction streams through stratified sampling.
We applied PASS on 12 SPEC CPU2000 benchmark and input combinations and achieved average 1.46 % IPC error bound from measurements of native execution on Itanium-2 machine with much smaller sampled instruction streams.