Investigating the limits of instruction level parallelism
High performance computer architectures increasingly use compile-time instruction scheduling to reorder code to expose parallelism that can be exploited at run-time. Although respectable performance increases have been reported, there is still a significant performance gap between what has been achieved and what has theoretically been shown to be possible. All scheduling algorithms used to reorder code, either explicitly or implicitly introduce barriers to code motion, which in turn limit the performance realised. Trace driven simulation is used to quantify the amount of instruction level parallelism available in general purpose code and the impact of various artificial barriers to code motion. This work is based on the Hatfield Superscalar Architecture, a progressive multiple instruction issue processor. The results of this study will be used to direct future developments in instruction scheduling technology.