|dc.description.abstract||Energy efficiency is of the utmost importance in modern high-performance embedded processor design. As the number of transistors on a chip continues to increase
each year, and processor logic becomes ever more complex, the dynamic switching power cost of running such processors increases. The continual progression in fabrication processes brings a reduction in the feature size of the transistor structures on chips with each new technology generation. This reduction in size increases the significance of leakage power (a constant drain that is proportional to the number of transistors). Particularly in embedded devices, the proportion of an electronic product’s power budget accounted for by the CPU is significant (often as much as 50%).
Dynamic branch prediction is a hardware mechanism used to forecast the direction,
and target address, of branch instructions. This is essential to high performance
pipelined and superscalar processors, where the direction and target of branches is not computed until several stages into the pipeline. Accurate branch prediction also acts to increase energy efficiency by reducing the amount of time spent executing mis-speculated instructions. ‘Stalling’ is no longer a sensible option when the significance of static power dissipation is considered. Dynamic branch prediction logic typically accounts for over 10% of a processor’s global power dissipation, making it an obvious target for energy optimisation.
Previous approaches at increasing the energy efficiency of dynamic branch prediction
logic has focused on either fully dynamic or fully static techniques. Dynamic techniques include the introduction of a new cache-like structure that can decide whether branch prediction logic should be accessed for a given branch, and static techniques tend to focus on scheduling around branch instructions so that a prediction is not needed (or the branch is removed completely).
This dissertation explores a method of combining static techniques and profiling
information with simple hardware support in order to reduce the number of
accesses made to a branch predictor. The local delay region is used on unconditional
absolute branches to avoid prediction, and, for most other branches, Adaptive Branch Bias Measurement (through profiling) is used to assign a static prediction that is as accurate as a dynamic prediction for that branch. This information is represented as two hint-bits in branch instructions, and then interpreted by simple hardware logic that bypasses both the lookup and update phases for appropriate branches.
The global processor power saving that can be achieved by this Combined Algorithm
is around 6% on the experimental architectures shown. These architectures
are based upon real contemporary embedded architecture specifications.
The introduction of the Combined Algorithm also significantly reduces the execution
time of programs on Multiple Instruction Issue processors. This is attributed
to the increase achieved in global prediction accuracy.||en