AMD OpenCL™ APP SDK 2.6: introducing OpenCL™ 1.2 preview and Static C++ kernel language
Several weeks or so ago in my prior blog I included a teaser on the imminent availability of not only the latest AMD APP SDK, but also that we would be releasing a preview of many OpenCL™ 1.2 features. In fact, there are so many great new features that are being introduced that I am struggling to order them. The new SDK and other supporting drivers and files discussed in this blog can be downloaded here.
Let’s start with the OpenCL 1.2 preview. Key new features supported in the preview are Host access flags for memory objects, greater flexibility for 1D and 2D image and buffer representations, memory object migration, a new generalized image creation API and more, see list below These preview features are currently only supported on the GPU, but during the first half of 2012 expect a complete implementation of all OpenCL required features and select extensions on both CPU and GPU. Start making your plans now to join us at AFDS ’12 in Belleview WA this coming June to see demonstrations using the new capabilities.
OpenCL 1.2 aside, one of the new features that I am really excited about is the preview of the OpenCL Static C++ Kernel Language Extension. This is a version of C++ that can be used to write OpenCL kernels. “This is huge!” was a comment from one of our ISVs when I disclosed this capability during a meeting at SC’11. Key capabilities of OpenCL Static C++ include kernel and function overloading, kernel and member templates, inheritance, friend classes, and more. For full details on supported and unsupported C++ capabilities please read the specification for OpenCL Static C++ Kernel Language Extension
Both the OpenCL and the OpenCL Static C++ Kernel Language Extension previews are supported in a special driver release that can be downloaded from the OpenCL SDK download page.
Pressing forward our commitment to make OpenCL easier to use, with this new SDK we are including the new Khronos C++ Wrapper API. This new API adds many benefits to the developer for host side programming in C++. It is no longer necessary to specify platform, context, or queue OpenCL host side objects, greatly simplifying host side code, programs are automatically built on creation for all devices in the associated context, and error checking is included by default. Finally the new make kernel function replaces the need for many OpenCL commands and helps improve type checking and overall code robustness. All of these changes taken together mean that it is now possible to write a complete OpenCL Hello World, including host side code, in just 20 lines of C++.
This release was designed to provide comprehensive performance improvements, both on the CPU side with key run-time performance improvements and support for both AVX and FMA4 instructions, and on the GPU side with the addition of asynchronous data copy and kernel execution (preview), support for atomic counters, and support for the cl_amd_media_ops2 OpenCL extension.
From AMD Catalyst™ software 11.11 we are now also including the OpenCL runtime in the AMD Catalyst software Linux drivers. This ensures that the maximum number of end users are enabled for use of OpenCL applications.
Key features supported in SDK 2.6 and the AMD Catalyst™ software 11.12 drivers include:
- Inclusion of the Khronos C++ Wrapper API
- OpenCL runtime integration into Linux in addition to Windows® AMD Catalyst™ drivers.
- Multi GPU support on Linux platforms.
- PX5 support.
- Support for AVX extensions for CPUs that support this extension.
- Support for FMA4 extensions in OpenCL built-in function libraries for CPUs that support this extension
- Kernel Reflection, query kernel parameters and enable use of OpenCL kernels in data driven applications.
- Support for Atomic counters on fusion devices.
- khr_fp64 support on AMD Radeon™ HD 69xx graphics devices.
- Redesign on OpenCL run-time on CPU significantly helps improve performance.
- Support for the cl_amd_media_ops2 extension, exposing hardware capabilities for accelerating image related processing.
- Async copies preview (set environment variable GPU_ASYNC_MEM_COPY=2 to enable).
The OpenCL 1.2 preview includes the following capabilities:
- Host access flags for memory objects enables more efficient buffer handling
- Pattern based GPU buffer and image initialization eliminates need for certain buffer/image transfers
- Memory objects migration supports transfer of buffer prior to need
- New generalized image creation API
- Enhanced image/buffer map operations
- OpenCL 1.2 CPU device partition including partition of a CPU after addition to a context
- Generalized 1D and 2D images, image arrays, and image<-> buffer interop
More details on how to access the OpenCL 1.2 preview are provided on the OpenCL SDK 2.6 developer page.
gDEBugger version 6.1 is a major performance and robustness improvements over version 6.0 and nan be downloaded for use with this SDK from http://developer.amd.com/gDEBugger.
- Integrated with Microsoft® Visual Studio®
- Stand alone version
- Registration no longer necessary for obtaining a license
APP KernelAnalyzer v 2.0:
- Support for AMD Radeon™ HD 7000 series GPUs (compilation only – no analysis)
- Support for AMD Catalyst™ software revisions through 11.11
- Support for compiling kernels with the installed driver (select Installed Driver under the CAL version in the Options panel)
- Format and Target Object Code are now separated.
APP Profiler v2.4 includes several key new features, including:
- A kernel occupancy analyzer which estimates, for each kernel dispatch, the number of in-flight wavefronts on a compute unit as a percentage of the theoretical maximum number of wavefronts that the compute unit can support. In addition to reporting the occupancy percentage, the profiler can display a report which can help the developer to achieve a higher occupancy percentage.
- The ability to navigate from the API trace to the source code that called an OpenCL API.
- OpenCL API analysis which provides performance suggestions to the developer.
- The ability to filter which OpenCL APIs will be traced.
- Several UI enhancements, including the ability to rename sessions from the Session Explorer Window, and the ability to automatically delete profiler sessions when closing a Microsoft® Visual Studio solution®.
- Preview: Support for profiling with AMD Radeon™ HD 7000 series GPUs (requires AMD APP SDK v2.6 and an AMD Catalyst™ software version that supports this hardware).
On a final note – remember to install the AMD Catalyst™ software 12.1 drivers when they are released in January – there are significant new capabilities that are going to be made available then. But more on that later.
Mark Ireton is the Product Manager for Compute Solutions at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.
POSTED IN: AMD APP, Inside Dev Central
TAGS: AMD APP, APU, GPGPU, heterogeneous computing, multi-core, OpenCL, Parallel Computing, Parallel Programming





When will gDEBugger be available for GNU/Linux?
It would be nice to know when will you add support for dual-GPU cards in your SDK.
Keep up the good work.
I was wondering… is possible to make Apache HTTP server run on graphic card?
How about Apache Tomcat?
What are yours thoughts?
great post. Would you be able to post a side by side comparison of a 20 line hello world program compared to how it used to be please?
Getting 1305M on my 570@850Mhz SLI, but the cards oearpte at ~50% each, so there’s something wrong really.
AMD have used a mdeifiod version with OpenCL. Also the stock version limits the particle count to stop your CPU lagging. Plus the window is a lot smaller, they are using some high res widescreen LCD.
Wow!! I aieppcrate what you’re performing! I require to relook at screen toaster! Informative and interesting post!!! maintain it up..
Hi Nathan,I’m running Ubuntu 10.10 (64-bit) inytrg to get an HD 5870 working for a couple of distributed computing projects. I was directed here by a forum post and am hoping the info here can help me. Being very much the Linux newb I have a few questions about your procedure and would appreciate some help if you have the time.1) When installing the Stream SDK/OpenCL libraries, to what directory might someone typically extract the files? I downloaded the .tgz file to the desktop, but I figure there’s a better place to put it.2) When editing the .bashrc file, is there any place in particular to put the export statements? Can I just add them at the end? One right after the other?3) You say to extract the icd-registration file to your /etc folder. Can you be a little more specific?4) Perhaps I should have asked this first, but are any steps unnecessary if I’m not a developer and just need the driver and OpenCL libraries installed. The GPU will be running programs that have already been developed and ready to use.Please excuse my newbness and thanks for any help you can provide.Mark
Pingback: AMD OpenCL(TM) APP SDK 2.6 and AMD Catalyst(TM) Developer Driver Available for Download | Worldpresse – Nachrichten und Pressemitteilungen