View All Business Blog Blogs

Chipping Away the Façade on Compilers and Benchmarks for AMD Processors

by Margaret Lewis

If you follow my Virtualization Blog you know about my efforts to chip away facades – false, superficial or artificial appearances – around processor performance.   The first week of this new decade saw the façade of the Intel compiler crumble with a number of articles talking about how this compiler allegedly generated crippled code for AMD processors. These articles were reporting on the US Federal Trade Commission (FTC) antitrust complaint against Intel, posted on Dec 16, 2009.

A compiler is a critical piece of software for a processor.  The compiler takes statements written in a specific programming language and turns these into the machine language that the processor understands.  The resulting “executable code” is what runs when you start up a program. By necessity, the relationship between the compiler and processor is very tight – the compiler must understand the specific features and functions of a processor so it can generate the best possible code.

A blog post on Dec 30, 2009 by Agner Fog, a Danish expert in software optimization, explains how a compiler  can create code for different processor types. Mr. Fog says that a compiler can “make multiple versions of a piece of code, each optimized for a certain processor and instruction set, for example SSE2, SSE3, etc. The system includes a function that detects which type of CPU it is running on and chooses the optimal code path for that CPU.” However, a system is not limited to detecting CPU functionality; it can also detect CPU brand.

Performance questions can arise when code paths are selected based on the “façade” of CPU brand and not by the underlying functionality of the CPU. What people don’t always understand is the prominent role that a compiler plays in certain benchmarks.  For example, SPEC offers an industry-standardized, CPU-intensive benchmark suite, SPEC CPU2006 that stresses a system’s processor, memory subsystem and compiler. SPEC  provides the source code for this benchmark, so you need to compile the benchmarks with C99, C++98 and Fortran-95 compilers or use a pre-compiled set of benchmark executables to run these tests.

The potential for a complier to skew benchmark results without a user being aware exists when a benchmark is distributed in binary form. This means that people running this benchmark might not know how the code was built – what compiler is used or if there is a reliance on components such as performance libraries. This could shortchange the industry – and may even cost companies dollars and lost productivity from decisions made on the basis of benchmark results based on compiler manipulation.

Let’s look at an example of how the compiler can influence a SPEC CPU2006 score. Below is a chart from AMD’s web site comparing published SPECint®_rate2006 performance of 2 socket servers.

Looking at the scores for the different models of AMD Opteron™ processors listed on the charts, you immediately see that adding cores increases integer performance as recorded by SPECint_rate2006. However the 62% performance gains between the older Quad-Core AMD Opteron processor and the newer Six-Core AMD Opteron processor seem to be more than what might be expected from added cores.  Some folks in the industry even picked up on this increase in integer performance and asked why.

Unlike the older scores, the newer SPECint_rate2006 scores were achieved using The Portland Group PGI Server Complete Version 8.0 and x86 Open64 4.2.2 Compiler Suite (from AMD). This represents the first time that the x86 Open64 compiler from AMD was used to run this benchmark suite. AMD understands the value of having a compiler that offers a high level of advanced optimizations and multi-threading for its processors and this is why we now offer the x86 Open64 Compiler Suite as an alternative to the developer community.  In fact we have just released x86 Open64 v4.2.3,   an updated version with optimizations for our upcoming AMD Opteron 6100 Series processor (codenamed “Magny-Cours”).

A couple of points to stress – the x86 Open64 Compiler Suite offered by AMD is intended to simplify and accelerate development and tuning for x86, AMD64 (AMD x86-64 Architecture), and Intel64 (Intel x86-64 Architecture) applications – it is not exclusive to AMD processors. It is based on Open64, an open source, optimizing compiler that supports a variety of micro-architectures – so AMD code is available for the entire community to review. Check out the most recent AMD Developer Blog where the AMD x86 Open64 Compiler Team Talks about Features and Optimization Flags.

At AMD we are not trying to build a façade around our compiler efforts. We are using the x86 Open64 Compiler as a general purpose compiler for high performance benchmarks and applications. AMD is also dedicated to working on optimizations for mainstream compilers, like Microsoft compilers and GCC, the GNU Compiler Collection, which are the compilers used by a majority of developers.

On the other hand, you could take the approach suggested by Agner Fog “Never trust any benchmark unless it is open source and compiled with a neutral compiler, such as Gnu or Microsoft.”

What are your thoughts on compilers and benchmarks? Do you feel that it is alright to optimize an x86 compiler to the exclusion of competitive processors? Do you think silicon vendors should be more open about compiler optimizations and how they may influence benchmarks?

Margaret Lewis (@margaretjlewis) is a Product Marketing Director at AMD. Her postings are her own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

SPEC and SPECint are registered trademarks of the Standard Performance Evaluation Corporation.

SHARE: twitter stumble upon delicious facebook

COMMENTS: 5

5 Comments

  • the Nitpicker January 28, 2010

    To be true to my nickname, the decade will only start NEXT YEAR…. this is the last year of the first decade of the new century/millenium

    • Wikipedia February 10, 2010

      Not so fast, Mr. Nitpicker! There is no right or wrong answer. It’s all in the context. A decade could start in any new calendar year. Check out the definition on Wikipedia. I am not saying they are absolutely correct, but I’d tend to agree with them.

  • John D. Carver February 20, 2010

    I simply refuse to use anything other than an AMD processor in ALL my builds and repairs.

    AMD is best by design and the other guy is a predator to one’s billfold.

    The worse example of the other compiler slight of hand was in early voice recognition software.

  • While PathScale was the reason AMD could leverage the performance benefits of Open64 at all. We have subsequently started a project called Path64 which will be our way to share early access with all the newest features and performance enhancements of our next release. Historically we focused mostly on AMD processors, but the market winds have changed and we find Intel performance increasingly important. With all the noise about the FTC findings many facts are commonly missed.. Open64 does a far far worse job of optimizing for Intel processors than Intel ever did for AMD. Open64 on core 2, core 2 duo and Nehalem is all, but completely un-supported and the cpu detection wrong at best. In the next final release of PathScale we have this fixed and have truly started to balance our optimization efforts. I hope AMD will quit with the benchmark centric optimizations and truly follow by example. I’m happy to work with any AMD or open source developer who is interested in similar goals.

    Feel free to find us at #pathscale – irc.freenode.net

    • H. Chu (CTO - Symas) April 21, 2010

      Just to lend another view point – I’ve found that code produced by gcc with AMD optimizations also runs faster on Intel chips than gcc/Intel optimizations. More specifically, gcc -march=amdfamily10 binaries run faster on Core2 chips than gcc -march=core2. So from what I can see, AMD is actually doing the right thing – producing code that is the best they possibly can, regardless of whose chip you run on.

Submit a Comment

Connect with Facebook

Reminder about Comments:

All comments will be moderated by AMD before they are published. Unrelated comments or requests for support will not be published. Please post your technical questions in the AMD Forums or for drivers and other support resources visit AMD Support. By submitting a comment, you are agreeing to AMD Terms and Conditions.