Function better with power predictability
01 September 2007
In today’s process geometries, power consumption has become a factor that must be managed in all chip designs. For many types of chips, such as plugged-in applications, this is a new challenge
In many applications, power consumption can often cause unpleasant surprises late in the design cycle. As if meeting aggressive frequencies and managing power consumptions were not enough of a challenge on their own, they are actually opposing forces. In other words, optimising for speed will cause an increase in power, and conversely, techniques to reduce power will reduce speed. While reducing area will reduce power, some power architectures can cause area increases that must be accounted for in early decision making.
These surprises often occur during physical implementation, where timing closure becomes more challenging due to long wires. Logic designers feel helpless at this point, left to only hope that power consumption does not require having to go with a different package, removing functionality, or other schedule-affecting measures. The logic designer’s lament is: ‘if I had only known earlier, I could have done something about it’. What can be done to improve power predictability?
As with any design problem, decisions and actions taken earlier in the process will have a broader effect on the outcome. The flipside is that predictions earlier in the flow will be less accurate. So coarse-grained techniques should be explored early, and more accurate predictions should be used to drive finer-grained techniques during implementation.
The first step in improving the predictability of any process involves establishing measurement criteria. Metrics and milestones must be created and tracked, and immediate action taken when necessary. Fortunately, chip designs all start with a detailed specification, where these metrics and milestones exist and can be extracted. Then progress should be monitored regularly throughout the project so that course correction can happen quickly if problems arise.
While the design’s specification is business- and market-driven, it has the largest impact on power. For instance, functionality, frequency, memories and process geometry, all have a huge impact on power. Thus it is important to make these decisions with as much insight into power effects as is possible. This is typically done by looking at what has been done before and applying a ‘process shrink factor’. With so many variables affecting power consumption at such a large magnitude, this method is too coarse. What is needed is a method that combines the experience from previous designs with as much real implementation information as possible. This can be accomplished by using early physical prototyping of the hard macros and I/O’s in the new process, combined with RTL power estimation of the RTL available, targeting the new process. This enables a more accurate estimate that reflects implementation specific details, and can be refined as more details become available during the design process.
A big part of estimating power consumption is the operating or activity profile of the design. Early prototypes can use default switching activities because the measurements are rough and not enough is available to generate accurate switching activity. As the design moves through implementation, more accuracy is required.
Functional verification testbenches will not generate an accurate switching activity because they are focused on covering all functional scenarios with as little repetition as possible. ATPG vectors are not sufficient for similar reasons. Of course, real operation involves a lot of repetition and little time in the corner cases that verification focuses on. Thus the best method for capturing switching activity is to have separate simulation runs to capture operating activity. Hardware acceleration can greatly improve efficiency here. Emulation is the ideal method since it runs the actual software, but this is not available until later in the design cycle. Getting accurate switching activity requires most of the RTL to be completed, so it is often done concurrently with, or iterating with, specification of the power implementation architecture. This part of the process is where implementation trade-offs are made, determining which parts of the chip are performance critical and which are not. This will dictate what voltage levelsl(s) to use, whether a variable voltage should be used, whether a block can be shut down, and if so what signals need to be isolated and if it needs to use state retention.
The implementation architecture has a large effect on power, but it also must be balanced with frequency goals and implementation feasibility. Techniques like power shut-off and frequency scaling will affect functionality, so it is important to close early on the right architecture. Time should be spent performing early ‘what if’ analysis to predict the power-timing-area-complexity trade-offs. This can be done quickly with RTL power estimation, but as the choices are narrowed, a more accurate measurement should be used; actual synthesis results or a silicon virtual prototype. The most efficient way to accomplish this is to have a central specification of the power implementation architecture, where a single change can propagate across the flow. The result of this process should be a power architecture that is projected to meet the performance, power, and functional specification.
The importance of RTL
Implementation begins at RTL synthesis, so it is important to begin looking at power consumption as soon as RTL becomes available. RTL power estimation can be used to examine relative block power consumption. If any block uses a higher percentage of power than expected, it can be addressed early through re-budgeting, fixing RTL or fixing constraints. Worst-case is having to make a change to the power architecture, but at this point it is still too late for such a change. It is always easier to make changes earlier than later, so measuring of power should be done as often as designers measure timing today.
RTL estimation needs to have a strong link to synthesis optimisation because of the wide range of optimisations performed during synthesis. For instance, will the design use clock gating? If so, will it use multi-stage clock gating? What are the minimum/maximum register constraints? Also, the logic structures created during synthesis have a great effect on power, and these need to be captured in RTL estimation.
A slow logic structure will have to be upsized, buffered, or implemented with low voltage threshold cells during physical design, causing power increase. Front-end teams often try to ease timing closure by globally applying upwards of 30 per cent or more timing margin to ensure smooth timing closure, but this overpowers most of the logic just to assure a few critical paths meet timing. Any structure that is too fast is going to consume more power than is necessary. So optimisation that focuses on multiple objectives simultaneously is key to creating a netlist that will close in physical design more predictability. Logic is structured to simultaneously address timing, area, and power objectives. Timing-critical paths are isolated and structured such that they require less effort in physical design. Finally, using a more physically-realistic model of wire timing is essential in creating structures that will meet frequency targets without being over-powered.
The last step?
Physical implementation is seen as the final step from the logic design team point of view, one over which they have little control, yet where bad power surprises happen. However many of the steps already outlined can prevent these surprises. If a well-balanced logic structure was created by a multi-objective synthesis engine, it will likely close timing in physical implementation much more cleanly, requiring less powering-up. If silicon prototyping was done and refined as blocks were developed, it would give a good idea of how physical implementation will go. This provides logic design teams with the ultimate in predictability for timing, area and, of course, power.
Achieving a predictable power closure flow goes beyond just doing early estimation. It requires power be a metric and a core part of the process from the early stages of design conception, and constantly analysed and re-evaluated throughout the process. It requires some changes to design methodologies, to take advantage of modern power reduction techniques. Additionally, it requires some changes to EDA tools, to enable fast timing-power-area trade-offs and to generate globally balanced logic structures that do not blow up during physical implementation. Power has begun to have a profound impact on the market feasibility and technical feasibility of digital ICs. It only seems logical that the way to address this is through a holistic approach to managing power predictability.
JACK ERICKSON is product marketing director, Cadence Design Systems
Contact Details and Archive...