Bulldozer 20 Questions, Part 3
“Can we get a dual socket overclockable board with 2-PCI-e X16 slots plus pcie-8,etc..Solid state caps,IE: all the top parts” – David Hunt
Client systems will be available in single socket version. The two socket client market is very small and getting smaller every quarter. It hit its peak a few years ago with dual core (giving you 4 total cores), but quad core processors essentially crushed that market. Now the fact that you will get 8 cores in a single socket will probably wipe out what little is left in that market. It is hard to justify putting the resources into a market that was, at its peak, .8% of the total client market and exited Q4 of last year at .4% of the client market.
AMD OpteronTM processors will be available in dual socket systems, but those will not have the overclocking capability that you are looking for. Server customers do not manually overclock their systems; they need to ensure reliability. While you can run a processor outside of its specified operating range (overclocking it to a higher frequency), when you do that you take on some risks.
First, you diminish the useful life of the processor. By how much is hard to say because all processors are different. But overclocking increases the potential for failure, not something you want on a server. Secondly, when you overclock, sometimes your results are not what you expected. 2+2=5. Not a problem if you are playing a game, but a real problem if you are using a server to run automated systems, make business decisions or keep financial records straight.
Marketing AMD OpteronTM Processors as “overclockable” would not help us in growing server share in the commercial market (and most likely hurt us), so it is not something that we will be pursuing.
“How much NUMA architecture will the Bulldozer be, or in other words, if I have a 4-socket Bulldozer how much will memory access differ between access to memory local to the socket and access to memory from other CPUs in the box.“ – Mikael Ronström
With Non-Uniform Memory Access (NUMA) each processor has its own memory controller and its own memory banks (local access). Processors can also access memory from the other processors in the system (remote access.) NUMA enables this access.
In the past, in a 4P system, some memory locations were 2 hops away, leading to greater latency. In today’s current AMD Opteron™ 4000 and 6000 Series platforms, there are enough HyperTransport™ technology links that all of the remote memory calls are only one hop away.
There will be enhancements to our memory controllers, things we can’t talk about just yet, that we expect to help reduce the time to access memory, both locally and remotely.
HT Assist, a feature that was introduced with our six-core AMD OpteronTM processors, helps reduce memory traffic and speed up memory access from remote locations.
As for actual memory access timings, I will leave that for launch.
“Can you explain how is your Multi-threading technology different from Intels? What are the advantages?” - Vygantas
We use actual, physical cores to handle multiple threads. Intel does this too, but they use HyperThreading technology to execute two threads on a single core as well which can create bottlenecks.
The challenge with HT is that it exploits gaps in the execution pipeline in order to get that second thread running. In a world where you have inefficiently executing applications, you have gaps in the pipeline and you can get that second thread executing. But, in efficient software, you have less opportunity to take advantage and you potentially end up with little or no gain. Some applications actually recommend turning off HT for better performance.
We will have cores, real physical cores, and that leads to better overall scalability. In heavily optimized systems, you aren’t fighting over execution pipelines because every thread has its own integer core. There is less system overhead involved in parsing out the threads because cores are all pretty much equal.
Take this scenario: a 4 core processor with HyperThreading with have all 4 physical cores actively handling threads. Now you need to execute a 5th thread. Do you put that thread on an already active core, reducing the processing of the thread already on that core because the two threads now have to share the same execution pipeline, or, do you wait a cycle and hope that one of those cores frees up? There is a lot more decision making when you have “big cores and HT cores”, but in the AMD world, you could have 8 or 16 cores, so the 5th thread just goes onto the next available physical core. It is much easier and much more scalable.
“What kind of compiler support will Bulldozer class cores receive from your partners and will intel’s ICC compiler have the support for Bulldozer’s AVX instruction set(and not discriminate it via CPUID flag like in the past with previous Opterons)?” – Ivan
We are working with all of the key compiler vendors to help ensure support of Bulldozer. We are spending a lot of time working with the Open64 compiler folks to make sure that there is support, as well as the PGI Group, GCC compiler and of course, Microsoft®.
AVX will require applications to be recompiled in order to take advantage of 256-bit floating point (either ours or our competitors).
I can’t comment on the ICC compiler, I recommend asking them that question.
“Please explain why having two separate integer cores is better than one fat one. For example, if each core has two ALUs and two AGUs and 16 KB of L1 cache, what if it was one integer core with 4 ALUs and 4 AGUs and 32KB cache? Theoretically, you’d get about the same performance for multi-threaded programs and better single threaded performance.” – Ryan
We get asked that a lot. The key is that a single core that would be able to compete with the throughput of two smaller cores would consume a disproportionate amount of die space and consume more power. Taking Bulldozer and turning each module into one “big core” instead of two cores with some shared resources would net you a disproportionately higher price and disproportionately higher power consumption.
In reality what we are doing is driving efficiency. And don’t worry about the single threaded performance –we have already stated publicly that Bulldozer single threaded performance is expected to be higher than our current core architectures.
What you have to keep in mind is that we are bringing innovation and driving towards the future. Back in 2005 when we did the first x86 dual core processors, there were some that argued that single core processors were better because a.) they had higher clock speed and b.) no applications really take advantage of multiple cores. Where are those people today?
When we innovated with bringing x86-64 to the market there were those that said 32-bit applications were better because they were faster and nobody really needs to access more than 2GB in most cases anyway. Where are those people today?
In this business you can either look out the windshield and focus on the road ahead and the technology that is coming up in the future or look in the rear view mirror and constantly obsess about how things were in the past.
The rules are changing now, just as they did in the past. AMD will continue to innovate.
John Fruehe is the Director of Product Marketing for Server/Workstation products at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied. This blog contains forward-looking statements. Forward-looking statements are generally preceded by words such as “plans,” “expects,” “believes,” “anticipates” or “intends.” AMD Investors are cautioned that all forward-looking statements in this blog involve risks and uncertainties that could cause actual results to differ materially from current expectations.
POSTED IN: AMD Opteron, Bulldozer
TAGS: AMD Opteron, Bulldozer


is each core less power full then a normal phenom2 core? or, because of the increassed effecenty, will they be more powerfull.
in a nother part (part 2?) you said that it would be 80% of a normal core? is this right?
thanks
The 80% number does not reference what you are referring to. The new cores will be higher performing than our current cores and will be lower power as well.
Pingback: 2010-10-07 Log