”Bulldozer” 20 Questions, Part 1
You’ve sent in your questions and we’ve begun to sort through them to pull out the best. There were plenty of common themes that were arising, so we’ll be grouping some of the bigger categories together. I am going to tackle some of the easiest ones first because some of the more technical questions will need to go to the engineers.
We’ll handle this blog in four rounds, with 5 questions each.
Let’s get started.
” There has been some confusion among those in the tech community regarding the actual CPU architecture, with ‘modules’ and ‘cores’ being explained differently by different people. “ – Waffle911
Yes, there has definitely been some confusion about modules and cores. Modules are only our way of laying out the subcomponents of the processor. You will not see us market modules as they are largely invisible to everyone but the designers. Operating systems, for instance, will enumerate the integer cores, seeing a 16-core AMD Operton™ processor (currently codenamed “Interlagos”) as 16 cores, not 8 modules. Modules do impact the way that certain CPU features are addressed – a discussion of which we’ll save for a later date – but in general we will focus on cores and not modules. The reason that we have modules is to help cut down on a lot of redundant circuitry in the processor. With multiple cores there is lots of duplication and this eats up die space and increases power draw. There are areas within the processor that can be shared because there is no major impact on performance, and other areas that should not be shared because they create bottlenecks.
You will never see a spec sheet with modules called out. Modules will not have a “marketing name”, they will only be “”Bulldozer” modules.” In reality, modules will only matter to the designers. Since we went out with ”Bulldozer” information very early we focused on the shared architecture and talked at the module level (it is still far too early to be sharing die shots….) Because of this the two most misunderstood theories became a.) the module was the whole processor and b.) the module was somehow equal to one core.
When we talk about cores we will always be using the most agreed upon definition of cores – the integer logic. Today most workloads are integer with a much smaller portion being floating point. This is why we focused on integer cores as the most logical way to define a core.
Each integer core will be able to run one software thread, and these threads can all be done simultaneously, unlike an SMT-type technology that lets two threads share one core. You typically find SMT technology on processors with much lower core counts, and its shared nature can create bottlenecks, even resulting in negative throughput in some cases.
As for core counts, here is what we have committed to at this point:
- “Interlagos” – 16-core server processor
- “Valencia” – 8-core server processor
- “Zambezi” – 8-core client processor
“What are the virtualization advantages of “Bulldozer” relative to current AMD and “Bulldozer” time-frame Intel architectures?” – Muzaffer Kal
Well, to begin with, the competition has not revealed anything about their virtualization features in that timeframe so I will stick with AMD comparisons.
One of the most striking and easy comparisons to make is the pure core count. In my experience, customers today tend to use the “one VM per core” rule of thumb. In today’s world that means up to 24 VMs for a 2P AMD Opteron™ 6100 Series platform (12 cores per processor x 2 processors = 24 cores = 24 VMs), and up to 32 VMs for a 16-core, “Bulldozer”-based 2P “Interlagos” system. Or you can run several robust multi-core VMs on a server; for example, you could run up to eight VMs on an “Interlagos” system, each with 4 vCPUs.
Although we will not be releasing technical details yet, some of the new features include making the caches more efficient, preserving live migration compatibility between our cores, and more effectively managing changes to virtual machines such that hypervisor interactions are limited.
In addition to a greater number of cores, the upcoming “Bulldozer” platform will feature L2 cache that will be shared between integer cores. So for those customers pinning VMs to cores, they have the ability to build a 2P VM, and tie it to two cores that share a common L2 cache. This can help cut down on some of the cache latency as the VM’s two cores have all of the adjacent shared cache lines in a single location.
There will also be some significant enhancements to our memory controller. This is the first major memory controller overhaul since the introduction of the Quad-Core AMD Opteron processor back in 2007. Back then, everyone was looking at virtualization, but not as many were deploying it. These new memory controller enhancements were designed with virtualization in mind so that there are more optimizations around the memory handling for virtualization.
Someone else had also asked about support for Hyper V and older OS’s. We plan to support Hyper V in the future, just as we do today. In terms of older OS’s – there will be some limitations mainly because older OS’s were developed at a time when processors had fewer cores and supported less memory. An older OS can always be run as guest OS on a virtualized server. AMD collaborates with Microsoft to ensure that new processors are well supported by a range of OS versions. We will publish more info as we approach launch.
“The x86 core (Bobcat) of AMD Fusion APU Ontario will be based on Bulldozer architecture?” – Fabio Mendes
Actually, these are different designs. The upcoming “Ontario” processor will be based on the “Bobcat” core, which has a different core architecture than “Bulldozer.” There have been some that have made the assumption that a Bobcat was just a scaled down “Bulldozer”, but they are, in fact, different. I’m sure that between the two there are similarities and some small sub-components that are shared, but you won’t see the modular design of “Bulldozer” in “Bobcat.”
“Will Bulldozer get a Turbo CORE for single threaded applications, just like the Thuban?” – Björn
Yes. There will be a Turbo CORE feature for “Bulldozer”, but there will be some improvements from what you see in “Thuban” (our 6-core AMD Phenom™ processor). There are some enhancements to give it more “turbo”. This will be the first introduction of the Turbo CORE technology in the server processors. We expect that this will translate into a big boost in performance when using single threaded applications, and there should be some interesting capabilities for heavier workloads as well. We’re pretty excited about how this will be implemented with “Bulldozer”, but the specifics of how this is implemented and the expected performance gains will not be disclosed until launch.
“Which architectural decision for Bulldozer has the biggest impact for server-class products and how does it achieve that impact?” – Andrew Cowley
That is actually a tougher question than it sounds because it depends on what you are looking to impact. I personally believe that what most customers are looking for is better performance per watt with each generation of product. Or, to be more specific, people are looking for greater performance and scalability, but they want to do it in the same power/thermal envelopes that they are used to with today’s servers.
The modular architecture really allows us to do this with “Bulldozer”. In today’s processors there is a lot of circuitry that sits idle for most cycles; it needs to be there for the peak, but most of the time it is just sitting. That not only eats up power, but adds to the die space (think: cost.)
By creating a modular architecture you have the ability to reduce/share a lot of the circuits that are lightly used, which can help cut down on power consumption and cost.
For those that want more performance, cutting down on the power consumption means that you can get higher clock speeds within the same power/thermal envelopes.
For those looking for lower overall power consumption, the modular architecture helps in that aspect as well.
Because of this modular architecture, we can increase the core count, so if you are interested in database, HPC or virtualization, that higher core count – with real cores – will help boost performance for your applications.
But the key to an architecture like this is understanding how to push the limits, but not go too far. Sharing everything results in low power consumption, but terrible performance. Sharing nothing results in higher performance, but you get hammered by the power consumption and the cost of the die. So the key to a modular architecture will be how successfully you plan the shared components to maximize your design goals.
Stay tuned, in the next update we will cover floating point, compilers and power efficiency.
John Fruehe is the Director of Product Marketing for Server/Workstation products at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied. This blog contains forward-looking statements. Forward-looking statements are generally preceded by words such as “plans,” “expects,” “believes,” “anticipates” or “intends.” AMD Investors are cautioned that all forward-looking statements in this blog involve risks and uncertainties that could cause actual results to differ materially from current expectations.
POSTED IN: AMD Opteron, Bulldozer
TAGS: AMD Opteron, Bulldozer


Does Bulldozer will have Intergrated Bus Controller ?
Can you elaborate on “bus controller”, that could mean a lot of things.
Does Bulldozer and Bobcat will have Intergrated Northbridge ?
Our current products have integrated northbridge today. “Northbridge” refers to a set of communicaitons technology that include things like memory controller and HyperTransport controller. In the old days, those were in a seperate chip.
I don’t know much about the technical BG, but all we graphics artists need is, more powerful workstation processor(we wont consider the watt consumption, cost.. we consider only time) it should Bulldoze i7… (2x bulldozer cpus in one amd board workstation 3.2ghz x2 ,24mb cache, 12cores x2 quad channel ddr3.. (much greedy? yes!!!))
Mr John, we will see a 32 core bulldozer after all? Why don’t you make such a chip? Or is the manufacturing process not small enough to support 32 cores?
I won’t comment on any future core counts.