It’s All About the Cores
As we embark on an interesting 2010 and even more interesting 2011, it is clear that the age of clock speed is behind us; it’s all about cores for the next few years until “Fusion-based” computing hits the server market.
And let’s be clear – AMD is designing its AMD Opteron™ processors to have the cores that customers need to drive their enterprise applications.
To begin with, let’s dissect the difference between threads and cores. Cores are physical blocks of logic in the processor that can run applications. In the old world, it was simple, one CPU = one core. Today, Six- Core AMD Opteron processors (formally code named “Istanbul”) are quickly becoming the mainstream and by the end of this quarter, eight and twelve will be the operative core counts per processor.
Threads on the other hand aren’t physical – they are software-generated tasks that can execute independently. In order for a program to run on multiple cores, you need to thread the program, or run multiple tasks simultaneously. The operating system takes the threads spawned by the program and schedules them to run on available cores.
So – cores are like bikes, threads are the riders. Running more threads increases throughput for applications as long as you have available cores. If you have threads waiting to be scheduled and no available cores – you have a bottleneck.
There are two major strategies to getting more efficiency out of your server. The first is the simple, straightforward way – feed that application more cores. That is why you are seeing 4+ cores in processors today. Nobody will argue against the point that giving applications more real cores will help increase overall throughput. However, some see another answer and wonder why AMD has chosen not to go down that path.
Simultaneous Multithreading (SMT) is a method for squeezing two threads into one core. SMT was first researched by IBM in 1968 and introduced to x86 processors by Intel in 2002 under the name of HyperThreading. That sounds great, in concept. Carpooling is more efficient than giving everyone their own car, right?
Well, car pooling falls apart if the two employees live too far from each other and the office is close. If Bob lives 3 miles north of the office and Mary lives 2 miles south of the office, it really doesn’t make sense for them to carpool. In the bike and rider example above, think of SMT as a tandem bike. Yes it can move two riders, but not as quickly or efficiently as two separate bikes.
The challenge with SMT is that as a technology, it forces two threads to share a single physical core.
Consider a software thread running on a hardware thread, where a second runnable software thread is then executed on another hardware thread on the same core. This could be triggered by an event like a stall due to a cache miss. The second thread does not necessarily thrash the cache; in fact there are situations where the cache lines used by both threads are shared resulting in little cache churn. However, in many cases the second thread causes the cache to be refilled with its own data, requiring the first thread to refill the cache in turn when it resumes execution. This competition for shared core resources on a processor with SMT is what can result in diminishing returns for SMT based processor, or worse, in situations with negative performance characteristics. (This paragraph was updated for clarity and to correct a statement that could have been misinterpreted…)
Generally speaking, SMT can give applications as much as an extra 10-20% increase in performance, which feels like that mythical “free lunch” that you were always told doesn’t exist. Well, don’t start eating yet, because there is a dark side to SMT. What if adding that extra thread actually decreased your throughput? What if 8 threads on 4 cores provided worse throughput than 4 threads on 4 cores?
Here are a few examples of opinions on the other side of the SMT discussion:
- A consultant who deals with Cognos, a leading BI software by IBM, recommend disabling HyperThreading because it “frequently degrades performance and proves unstable.”
- Microsoft recommends turning off HyperThreading when running PeopleSoft applications because “our lab testing has shown little or no improvement.”
- A Microsoft TechNet article recommends disabling Hyper-threading for production Exchange servers and “only enabled if absolutely necessary as a temporary measure to increase CPU capacity until additional hardware can be obtained.”
- Advanced Clustering found when running High Performance Linpack (HPL) that “Using HT on the other hand causes a ~10% drop in performance compared to HT not being used.”
There are more examples, but the “free lunch” is obviously not quite as tasty as you might have originally expected.
So, if SMT (or “core sharing”) yields both positive and negative results, what is the better answer? How about more cores? When you add more cores, you add more throughput. Period.
When you run multiple threads over multiple cores, you can expect better performance, and that is the AMD strategy. With “Magny Cours” we’re planning 8 and 12 cores per processor running 8 and 12 threads, not 8 or 12 threads sharing 4 or 6 cores. No sharing needed, every thread can be as selfish as it needs to be. Then in 2011, we plan to introduce “Interlagos” and increase the core count again, to 12 and 16. With “Interlagos” we’re designing some shared components that help reduce power consumption and die size, but you won’t see us sharing integer pipelines, the “meat” of the core.
By keeping discrete integer cores, and delivering more of those cores per CPU, AMD is designing processors that are designed to help you get more throughput for your enterprise applications.
Here’s AMD’s Core Commitment for servers:
- AMD is working to deliver more cores for your business critical applications and a wider choice of core configurations. From 4 cores through 12 cores per processor planned for 2010 and 6 to 16 cores planned in 2011, AMD is working to deliver more of the resources that you need to drive your business forward.
- Our cores are real. Threads can run faster when they have their own core underneath rather than having to share. If you have to run 12 threads, we know you would rather have 12 cores with unfettered access than worry about sharing cores.
Of course there are those that can say “well, things like SMT can be implemented inexpensively and don’t consume that much power.” To those, I ask you, historically hasn’t AMD been the one committed to deliver better value and lower power? Why would we stray from our core principles?
If you can get all the cores you need at the price you need and the power envelope that you need, then why would you ever consider anything else? Why would you ever compromise? Have your cake, and eat it too. THAT is your free lunch. And it’s delicious.
John Fruehe is the Director of Product Marketing for Server/Workstation products at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.
POSTED IN: AMD Opteron
TAGS: AMD Opteron, Fusion, multithreading


You’re not solving the problem.
The problem is memory bandwidth. You’ve built a giant factory, but it’s in the middle of nowhere with a dirt track leading to it.
How are you going to increase memory bandwidth without killing all gains with increased latency?
Four channels of DDR-3 memory will help keep all of those cores fed and happy.
3 channels, and only on i7. 4 channel technology doesn’t exist yet; at least not in any mainstream implementation I’m aware of. 2 channels is still predominant.
“According to Intel, a Core i7 with DDR3 operating at 1066 MHz will offer peak data transfer rates of 25.6 GB/s when operating in triple-channel interleaved mode”
That may be true as a client statement, but in the server world 3 channels is more prevalent.
I can’t wait until the Bulldozer cores rolls out of GlobalFoundries. My next rig will definitely be using a Bulldozer CPU.
It’s true, SMT isn’t the best idea, it doesn’t really help the XBOX 360 and the PS3 that’s for sure. It might help boost it in some areas, but SMT is horrible in others.
It’s also why I don’t see why people are falling all over the Core i7 when the Core i5 750 is pretty much the same.
Great timing on explaining something I’m just starting to learn about. I only barely understand pipes & didn’t know they were pertinent to hyperthreading.
I suppose my only criticism would be that now I’m left wondering how an UltraSPARC T2 fits 8 threads on 1 core.
So much of what happens depends on the architecture. The UltraSPARC RISC architecture is different from X86 so threads are handled a bit different and SMT does a better job there than it does on x86.
Pingback: Out there in the Wild « Voidnothings
Awesome information! Keep up the great work, AMD. Your rivalry with Intel makes both companies better and pushes the bleeding edge of technology for all Americans.
This is great, and I appreciate the information, but there are I few things I’ve never gotten a good understanding of. So you’ve got this die, the chip on silicon. And when we make cores, we just split the die up into even more sectors, and decrease the amount of transistors, right? But we then have several L1 caches, for each core, which don’t need to be swapping memory as often as when there was a single core. So if I understand correctly, ideally we could have a single core for each thread per process. Currently I have… 3 running processes, all of which are single threaded; the rest are sleeping. So would, say four cores be perfect for me? And in the future, how much will we dismember processes and split them up into threads, I mean on the consumer level.
Thanks
I am a long-time AMD customer, and have always preferred AMD because of their greater value, but what are you guys doing to improve single-thread performance? The dual-core i3 trades blows with the Athlon II X4 620 in multithreaded environments. Like it or not, hyperthreading is a part of that, but an even larger part of that is the i3′s insane single-thread performance. Can we expect to see a similiar increase in single-threaded performance when Bulldozer arrives?
As a server guy I can’t speak to the client side. However I can say that single thread performance is going to become less and less important. If you remember back in the “old days” when the first 32-bit Windows came out, everyone say the old stuff, the 16-bit versions, ran faster. But as 32-bit became the norm, 16-bit performance became less important. Single threaded performance might be important for some applications like games, but that is because many of them were written when the target platform had one core, so they were optimized around that.
The software being written today has a target platform with 4 cores (on the consumer side), so when it gets to market (in the bulldozer timeframe), it will be optimized for multiple cores and the “single core performance” will be less important. If you are trying to judge the value of products in the future you can’t use the performance metrics of the past.
This is a very clear explanation. Even I felt like I understood it. It does seem that AMD is concentrating on the server market and abandoning the consumer space? I know that consumers buy $600 commodity products. But some, like me, average about $2K just for the box and would like to see a competitor to Intel. Finally, my sense is that after the Athlon 64, AMD CPUs tend to run hot for a given amount of performance. I now want a completely silent PC — and not an Atom!
Well, I can’t speak to the client side since I am a server person. Interesting observation, but as a server person, I always feel like the consumer guys get all of the attention
I do have a very silent PC in my family room. I am sitting 3 feet away from it right now and I cannot hear it at all. It is a home theater PC. A 65W phenom quad core, a CoolerMaster fanless heatsink, 4GB of memory and an Antec case with 2 120mm fans. Virtually silent, and very powerful.
Pingback: MartinGehrke.com » The Core War
Pingback: Hyper Threading CPU