Bulldozer Goes to 11
If you are a fan of fine cinema like me, then you probably remember the scene in “This is Spinal Tap” where Nigel Tufnel explains how his guitar amp was modified so that now all of the dials “go to 11.” He’s asked “Why don’t you just make 10 louder and make 10 be the top number, and make that a little louder?” His response? “These go to 11.”
Bulldozer goes to 11.
We are including a new feature in “Bulldozer” that will help you take your performance up to 11. This feature, which is new to our server processors, is called AMD Turbo CORE technology and it allows you to capture that extra power headroom between average and maximum power, turning it into more clock speed. So, how does it work? Perhaps a little background about how clock speed is derived will help first.
Contrary to popular belief, clock speed is not determined based on the best case scenario; that is to say that when we stamp a number like 2.2 GHz onto an AMD Opteron™ 6174 processor, that isn’t the best that it can perform. To some degree it is more like the worst case scenario.
Processors all run workloads, and to determine the clock speed, you determine what is the maximum speed that you can run the processor at before you hit the maximum amount of power allowed, the thermal design power or TDP. TDP is the maximum allowed power consumption, but, like the maximum speed capability of your car, you will rarely ever hit TDP because most workloads don’t stress the processors the way the testing does.
When we test processors to assign their clock speed, we actually test them under a methodology that includes using programs designed to stress every transistor at the same time, maximizing the power consumption to try to reach TDP. The challenge is that the workloads that most customers use don’t come anywhere close to consuming the power that the test does, so the actual marked clock speed is conservative. Obviously when I type this blog the CPU power is nowhere near the power of running computational fluid dynamics programs, so there is a variance between workloads; the same silicon would have a lower clock speed for server than client workloads because the client workload is less intensive.
Back when I arrived at AMD, one of the most common complaints from customers was around the fact that we marked our processors with a 95W TDP, but they rarely consumed more than 50W; this meant customers were over-provisioning their data centers and not maximizing their rack space. We developed ACP (Average CPU Power) to address this. TDP still represented the top power consumption of the processor, but the ACP reflected what customers would see in regular use with regular workloads. I still felt that ACP was too conservative because it looked at servers under 100% load, but that is a discussion for another day.
Now let’s begin by taking a processor from the past, an AMD Opteron 2376 processor, which had an ACP of 75W and a TDP of 115W for comparison. The processor has a 40W spread between average and max, or about 35% of “headroom” above where the average workload runs.
Now today’s 12-core AMD Opteron™ 6174 still has a TDP of 115W with an ACP 80W, ~30% headroom. Why did this go up? Well we added a technology called AMD CoolSpeed, which will throttle down the processor if it gets too close to the TDP power. This technology allows AMD to “lean into” that additional headroom, picking up some extra frequency. This is how AMD was able to put 12 cores into the same thermal range as our previous 6-core products while maintaining reasonably close clock speeds.
Prior to AMD CoolSpeed we needed to keep the clock speed lower, but now that we have a way to pull back the speed, we can actually achieve higher clock speeds. If you’ve ever been to the Grand Canyon you know that where there is a railing people will get much closer to the edge of the rim, but if there is no guard rail, you’re going to stand back a little more for safety.
However, even though we have recouped some of that headroom (~5%) through AMD CoolSpeed, how do we tap into the rest? That is where AMD Turbo CORE comes in. AMD Turbo CORE is a technology that was recently introduced in the AMD Phenom™ processor, but the way we deliver it in Bulldozer is greatly enhanced.
AMD Turbo CORE allows customers to tap into that additional clock speed headroom by allowing the processor to rise up from the base clock speed up to the TDP level, automatically unlocking extra potential for the processor. Should the processor get too close to the power limit, it does automatically step back a bit to ensure that it is continuing to operate within the specified guidelines. This allows for significantly higher maximum clock speeds.
The chart below shows how Bulldozer allows AMD to potentially capture that lost headroom and turn it back into clock speed for the processor to utilize:
*For illustrative purposes only; not to scale
Some of the benefits of AMD Turbo CORE include:
- Up to 500MHz of additional clock speed available with all cores active. This means even with 16 cores active with server workloads, all cores can boost at the same time. For those customers that want to maximize their performance, they now have the tools to do it.
- Even higher boost states available with half of the cores active. We’re not stating exactly how high processors can boost with AMD Turbo CORE, but obviously if there is room for up to 500MHz with all cores active, fewer active cores would obviously mean less power, and more headroom to recapture with AMD Turbo CORE. At launch you will see processors marketed with a base and a maximum frequency, base will reflect the actual clock speed on the processor and max will reflect the highest AMD Turbo CORE state.
- AMD Turbo CORE is deterministic, governed by power draw, not temperature as other competing products are. This means that even in warmer climates you’ll be able to take advantage of that extra headroom if you choose. This helps ensure a max frequency is workload dependent, making it more consistent and repeatable.
With Bulldozer AMD now gives customers ways to maximize their processors for various roles in the data center:
Does AMD Turbo CORE deliver a guaranteed frequency and can you just set the frequency to a higher multiplier? No, that is not how it works. AMD Turbo CORE is continually monitoring the processor power consumption to determine the maximum processor state. For simplicity’s sake, think of it as the opposite of AMD PowerNow! technology. Instead of trying to watch for usage patterns and lower the processor core to try to reduce power consumption, Turbo CORE is watching the power consumption to see how high it can move the clock speed up. The clock speed will vary but not as much as with AMD PowerNow! because it is moving over a different range and responding to different algorithms.
The diagram below is a depiction of how a typical server could utilize AMD Turbo CORE technology in conjunction with the standard P-state management through the AMD PowerNow! technology:
As you can see over time, in periods of normal activity the processor will fluctuate between the minimum frequency and the base frequency through use of the AMD PowerNow! driver, which is integrated into most modern operating systems. The fluctuation occurs in what I will call the “standard zone” – the normal operating range. But, if the workload suddenly becomes much higher in utilization, you see the processor move into what I call the “boost zone” – this is where AMD Turbo CORE takes over, because of the high demands, and squeezes as much performance out of the processor that it can in order to maximize performance.
Now, I am sure that you are saying “great, but that just pushes up my overall power.” Well, yes, power always does go up with frequency, but the most important thing to remember is that this is also a variable technology; it only increases frequency based on total demand from the application. The system runs at the base frequency unless there is a need for more performance from the application, so in environments that are “bursty” you’ll vary frequency with AMD PowerNow! between base and the lowest power. If demand for performance is sustained, the processor runs between base and max, so that you are getting the most performance, but still not wasting power.
The combination of AMD PowerNow! and the new AMD Turbo CORE technology allows customers to both maximize their clock speed when they need it most, yet still keep their power in check by reducing clock frequency as the load begins to diminish.
As you can see, there is a new level of flexibility in AMD Opteron processors with this new feature being added and this should allow customers even more control in their data centers. Even if they want to take it to 11.
John Fruehe is the Director of Product Marketing for Server, Embedded and FireStream products at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.
POSTED IN: AMD Opteron, Bulldozer
TAGS: ACP, AMD Opteron, Bulldozer, TDP, Turbo Core





IBM has introduced Graphene instead of silicon to processors to make them go at 300 GHz
Of course it will still be awhile before we actually see any of them.
I had a 500MHz, slot A Athlon and eventually moved up to dual 2.2GHz Opterons. I loved them, but I now have an i7 950. Ya gotta crank up the IPC! These turbo boosts only get you so far. In Bulldozer, sharing the math coprocessor is probably a good move, for the math coprocessor wars are won, by everyone. But branchy integer code and loops are where it’s at. Everything is a virtual machine anymore. You gotta beef up the TLB’s and speed up the caches. Even find new things to cache! I would like to buy AMD again. Give us a processor that will crunch loops and conditional store/merges and you can hold off on AVN compatibility. Enterprise and business software is getting expensive and Open Source software is conservatively built on GNU compilers several ISA versions back. The crazy math programs are running on GPUs anyhow. Hell, tune your procs to Oracle and .NET and the buyers will come. Thank you.
Where can I ask questions about the game performance of processors Bulldozer?
Unfortunately this is the business blog.
Is it still too early to ask some more specific questions about Bulldozer performance in the light of the recent leaked performance information (and the optimization manual for Bulldozer architecture) ?
In particular….
The benchmark charts that you see floating around the internet are most likely not real. Too many people with too much time on their hands. Keep in mind that there are always several revisions of silicon that come out before the final. The early rounds of silicon are generally targeted at lower clock speeds because you want to maximize the yield (when you are going through the validation cycles it is better to have more parts in hand than fewer parts with higher frequencies; we do all of the performance work on the final rev of silicon.)
I would be careful not to draw conclusions on the software optimization guide as that only gives you a small insight into the architecture. With the data that is available today you are only seeing one small part of the architecture. Using the lens of previous architectures to try to compare a new architecture is going to yield less than optimal comparisons. When the first CDs specifications came out, if you compared the size to a normal record, you would assume that you could only hold half of the music.
Thanks for the answer. Does it mean that there is a hope that the final specifications of the mass-production Bulldozer revision could be better than it is assumed from the revision specifications (for engineering samples, i suppose) described in the optimization guide ?
We obviously don’t discuss that until launch.