OpenCL Changes the Game
Up until now, GPGPU has been a research technology for early adopters – a new, promising experimental capability for scientists, engineers, financial professionals, and others running compute-intensive applications. Two elements have kept GPGPU largely in the ivory tower: first, the available APIs were proprietary and second, the GPU has been treated as an independent application accelerator instead of as part of a balanced heterogeneous architecture. OpenCL is a game-changing development in both respects, and AMD is taking an important step on that journey today.
In the past, proprietary programming models like CUDA limited target platforms to those from a single vendor. This may have been fine for students experimenting with a new approach, but mainstream ISVs and other large-scale developers need the flexibility inherent in industry standards. With a standard, cross-platform API, developers can deliver solutions on multiple vendors’ hardware while streamlining their development processes and timelines. This is what they’re waiting for – we hear it every day.
Of course no application runs entirely on the GPU. Beyond the obvious need for CPUs to drive execution, most mainstream applications are heterogeneous in nature. They have some functions that accelerate well on multicore CPUs, and others that are perfectly suited for a GPU’s data parallel architecture. A good development platform needs to take that into account – this is the difference between GPGPU as a niche accelerator and GPGPU as a new baseline feature, ready for tomorrow’s systems and applications.
Today, AMD is delivering the first beta release of an OpenCL implementation for the CPU. Managed by the independent Khronos Group, OpenCL addresses the need for a cross-platform, industry standard approach to development for heterogeneous architectures. This can enable more developers to take advantage of GPGPU acceleration in their applications, but what is even more compelling is the opportunity to build applications that leverage all of the system’s compute resources – CPUs and GPUs – to provide a superior user experience. As the only company that designs and delivers both high-performance GPUs and x86 CPUs to the market, AMD is uniquely qualified to help application developers drive full resource utilization forward without feeling the need to force-fit workloads onto one technology or the other.
With the new OpenCL implementation for the CPU, application developers can begin realizing the promise of heterogeneous computing. A video of a 4P Six-Core AMD OpteronTM processor-based system (24 total cores) running an OpenCL-based, fluid/particle simulation can be seen here; for a developer-focused look at how OpenCL forms the basis of an evolving parallel programming ecosystem, see my colleague Margaret Lewis’ blog, Making the Universe Parallel.
Patricia Harrell is Director of Stream Computing at AMD.
Her postings are her own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.
POSTED IN: AMD Opteron
TAGS: ATI Stream, Beta, OpenCL


Hey, when it comes to your GPU products? I thought nvidia is ahead for opencl implementation for GPU.
thats all well and good AMD “Guest Blogger at 8:00 am” with No Name.
but it doesnt in any way help the worlds 3rd party OSS Devs utilise the generic hardware assisted “UVD” ASIC AVC,VC-1,(and to a lesser degree HD mpeg2) video playback and editing use as found in all current AMD/ATI Gfx cards does it.
your so FAR behind Nvidia’a CUDA ASIC and their massive internal and 3rd party developers code drops its not funny….and hasnt been for well over a year now….
if you cant even complete an internal review to find a way to Open up the “UVD” ,never mind actually commit to finally putting it to the top of the “Do Now, What Needs To Be Done Now” list for mass consumers and devs to try and catch up with the CUDA ASIC and other 3rd party hardware assisted programming.
then how can we even think about your other passing thought, make them use this new OpenCL on the Gfx shaders etc instead, so as not to give the Existing “UVD” ASIC hardware sitting there unused by virtually anything other than a very limited and focused DVD windows payware app, and certainly nothing in the 3rd party OSS linux field ANYWARE….
this OpenCL may or not become something one day , but not within the lifetime of the existing “UVD” you have put on virtually every single current ATI card.
but you didnt wrap it up in an existing generic API extension from day one, or even give out the “UVD” ASIC datasheets, OR made any “UVD” code drops to the FFmpeg and x264 open code bases were this would be put to very good use by the worlds MASSES for playback AND realtime frame based AVC Editing etc….
shame on you for throwing this perfectly good ASIC Hardware way without a care.
UVD playback hardware acceleration is already publicly accessible today through DirectX Video Acceleration (DXVA and DXVA2) on Windows, including decode acceleration of HD videos. Many developers (Microsoft as well as commercial vendors like Cyberlink and ArcSoft) have enabled hardware acceleration through that API in their playback applications on ATI Radeon HD graphics since HD 2000 series (2007), and even open source players like Media Player Classic have hardware acceleration using UVD.
AMD has also supported accelerated video transcoding and editing through ATI Stream with partners like Cyberlink for media conversion software (MediaShow Espresso) and HD video editing (PowerDirector 8).
Regarding access to UVD cross platform, OpenCL will be an excellent vehicle, and we hope to see APIs in OpenCL that enable that capability. No hardware vendor has this capability today in OpenCL.
AMD has made a commitment to enable developers to write cross-platform applications through industry standards. That’s why OpenCL is so important – unlike the API mentioned in the comment that remains proprietary.
Right now, we’re both working on the OpenCL implementation for the GPU, but the difference is that AMD is really focused on delivering a complete OpenCL development platform. Our ATI Stream SDK 2.0 will benefit parallel application development across CPUs and GPUs (something our competitor can’t say). You’ll see AMD’s OpenCL for GPU implementation later in 2009, and that complete platform is going to be extremely impactful to developers.
Patricia,
How we get you to give us a presentation on what AMD is doing with OpenCL and your Streams initiative. We are seeing a strong interest in the defense community. The community is seeing lots of activity from NVidia around GPU use in compute intensive environments.
Dennis, thanks for your comment – we’ve seen a lot of promising work in defense, signal processing, and other technical government applications. We’ll connect with you directly on scheduling a presentation.
Intel Parallel Studio helps developers exploit multiple cores
Review: Intel tools work hand-in-hand with Microsoft Visual Studio -from eweek
Intel’s solution relative to GUGPU(Larabee) is purely a C++ solution; oh yeah did you hear Intel just broughtRapidMind, a framework for expressing data-parallel computations from within C++ and executing them on multicore processors.
all this in familiar C++, and libraries up the kazoo, to boot…comon’ guys. free yourselves from the tyranny of Intel -ahhh- Wintel. You guys have to be really really good to have the whole Wintel camp plotting AMD’s demise (Et Tu Brute). Have you thought about bonding with the Linix world and use AMD’s talent and genius to build a new world. Dont under estimate the potential power of Linux, its the core of HPC. Unless of course you guys have a rabbit or two in that hat- like speed up the 32nm process.
asH
Hi!
i try MediaShow Espresso for test of Stream and UVD. But Stream only works at transcoding. UVD status is idle in AMD GPU Clock Tool at transcode and BUSY only at play-mod. Why UVD doesn’t work at transcoding??? CCC 9.11, HD4870.
And i try to compare transcoding with and without HWA (stream) in Espresso. But i can to do it normally because of output bitrate with stream is 2500kbit/s and without stream – 5000 kbit/s. So i have difference in time, bitrate and size in result. But It’s not good to compare timeresults because of such different bitrate.
If i Use AsVideoConv_2.1.1.0 – (it’s mod of ATI transcoder. It can enable/disable GPU support) – i have identical bitrate and transcoding time. So I conclude, that Stream do not speed up transcoding…
Why there are no normal programs, which help us to make normal comparison of stream|no-strean and show us real speed-up of process?
I’ve bopped ’round AMD’s site, and am an old-timer in the linux/AMD use category ( since mid ’90′s ).
IF you find a given comment could increase AMD’s competitiveness, please get the essence of each comment to the AMD-person that needs it
( as I don’t have access to your employee-tree…
1. OpenCL, according to the AMD page I saw, is built for Intel & GCC.
That’s *beyond* daft: the Intel compiler is ENGINEERED to bosh performance on all code *running* on an AMD chip.
OpenCL SHOULD be running on Open64 & GCC, with Intel’s abusive compiler being third, not first/second.
2. web-site: please get your web-meister(ess) to put an AMD-logo in the /favicon.ico for all AMD sub-domains, so that we who use Firefox’s Live Bookmarks to read feeds can SEE which articles we’ve read. ( including blogs.amd.com )
3. Has AMD actually looked at how the mobo-makers are crippling AMD’s position & blocking choice amongst US?
I *CANNOT* buy a 7xPCIe AMD-based “enthusiast” board, or even a workstation-board!
ONLY Intel based mobos are being made with PCIe all-the-way across ( that-is all full-size slots: I’ve neither knowledge nor care if any AMD based board has 7 1x PCIe’s )
How about the NAS/MiniServer market?
Asus makes ECC possible on *all* AMD based mobos, but ALL the microATX boards they make waste 2 slots on old-school PCI!
In the Intel space one has choice, in the AMD-based space, one is prevented from it…
Could AMD *PLEASE* Explicitly_Ask whatever mobo-makers you like to make ECC-enabled microATX/DTX ( more on that in a sec ) mobos with *all* PCIe, straight-across, as PART of their product-line?
4. DTX: could AMD please make a “standard” ( think “Centrino”, except non-bogus ) based on
80-plus-Gold,
95w CPU or less,
2 x16 PCIe’s & 2 x4′s,
DTX,
aluminum drive-rails for cooling ( & therefore longer life, also less shock-transmitting, too )…
MIN 1x 5.25″, 1x 3.5″ external ( card-reader ), 1x 3.5″ internal, 1x 2.5″ ( ssd )
( the reason for “MIN” is because many won’t buy a machine without RAID1 or RAID0 potential )
… so that the industry would begin making the less-wasteful DTX standard available for us, while leaving the door open to Via & Intel to participate ( AMD has shown such integrity in the past, I’d hope this could be
It *isn’t possible* for us individuals to make it happen: someone upstream has to motorvate it.
5. ever heard of motherboards.org? MOBOT?
*Please* make an equiv, on your site, for every product-category AMD-based products are sold in:
Make it POSSIBLE for countless-us to go to 1 page & specify whatever it is that matters to us ( your AMDCompare.com site prevents this, as it won’t let me compare side-by-side 4-core & 6-core Opterons in the same page, & won’t SHOW me what the difference is between Athlon II’s & Phenom CPUs )
and let us see ALL the available potential, weeding out the stuff that we discover doesn’t match our want, iteratively.
( think Opterons/Desktop-CPUs:
-lists all-
*Minimum* # of Cores: -selector=2/4/6/8/12-
*Minimum* Frequency: -selector=whatever
*Minimum* L3 Cache: -selector=none/1MB/2MB/whatever
etc.
EVERY differentiating item between them listed explicitly
then, after we’ve set our choices, THEN we hit the COMMIT button ( instead of having 12-15 refreshes while trying to weed it down, as is currently the case )
By setting it as the MINIMUM, instead of “choose ONE of these options”, we can compare ALL the AMD products that potentially meet our requirements, instead of fighting with multiple windows, each showing your “permitted” slice of the solution-space ( offensive to us as hell, wasteful of our hours, frustrating & off-putting. In short, it isn’t effective in increasing AMD’s marketshare as it could-be, by FAR, and I’ve tried getting this into AMD’s understanding, intermittently, for years .. FIX it, and you’ll find more specify your products! )
Anyways, some requirements for the AMDCompare site, IF it is to increase AMD’s revenue as much as it can are
1. must show ALL differentiations between products within a REAL category ( Desktop, Opteron ), not within some artificial pseudo-category ( 4-core procs, 6-core procs )
2. must show ALL products that meet specifier’s minimum-requirements, instead of hiding most of ‘em because of only letting us see one single slice.
3. must work for script-disabled browsers ( better than Asus, currently: that’s for sure! if you’re disabled, Asus indicates you should f*ck-off + die, through *their* site’s implementation )
4. must allow one to easily & efficiently weed-down one’s choices until one sees, clearly, what it is that one needs to put on the purchase-orders. ( aka Do Not Waste Specifier’s Life Frustrating Them! )
5. must be available for all AMD-based products
( think of hiring Amazon’s “Mechanical Turk” to update an AMD-Mobot, if you implement one: any not-yet-added correct entry, gets $5 Amazon credit: it’d cost AMD little, pay the poor usable money, AND increase AMD’s market-leverage by making AMD-based-product information findable NOW.
Think of how Google centralized search?
Centralize AMD-search! )
Doing so would swing, gradually, the entire-market.
6. Either ask Fujitsu/Toshiba/Asus/whomever to make higher-end AMD-based notebooks, or contract ‘em to do it, directly.
Currently NO higher-end notebooks are available for AMD-CPU.
( 1920×1200, wide-gamut LED panel, x3 or more CPU, 4+GB RAM… )
Instead of just playing possum, MAKE the choice available…
Shuttle recently was quoted as saying a mere 500 notebooks were sufficient to engage them in making a model…
*in a segment where the industry isn’t producing*, why not AMD instigate it?
7. G34 not being available in 1-socket…
Some loads are RAM-bound, not CPU-bound
( e.g. if I can get the database in RAM, disk-speed just ceased to be the bottleneck, or
if I can get the whole compute-dataset that has to be transformed into RAM, then it’s cache/cache/cache/RAM for the rest of the compute-session )
By indicating the G34 ( 4-channel RAM, bigger quantity, too ) *won’t* be available for single-socket boards, AMD’s shut out some of its potential market.
Hell: if I could buy a single-socket 6-8-core, 3GHz G34, that could take 32GB of RAM, and had enough x16 slots for me to give 1 HD5750 & a pro soundcard ( & maybe a HDMI-in card or 2 ) & 1 RAID controller
to my guest OS
while keeping enough for my main linux system, I’d do it now!
( think *trustworthy* video-editing workstation:
with Linux on the metal, I’ve no worries ’bout trojans, web-attacks, viruses, etc: just use a Linux browser & save everything I need, say from HD stock agencies, in a shared folder: so long as Windows isn’t touching the “metal” OR the ‘net, it can’t get wacko on me, can it?!
Please consider that the G34 mobos, IF available in single-socket config, are going to compete in the “gaming” space, whereas your current cpus are at disadvantage against Intel there ( both RAM-speed & GHz, cut 1 down & you can gain that mindshare! ), and the gamers seem to be extremely influential throughout the market!
( not a gamer meself, but the money I put into tech puts me in their crowd lots ).
Anyways, thanks, for years of making the Right Decisions(tm)
( uncrippled chips,
ECC even on “personal” CPUs,
AMD64,
honestly helping the machines-for-the-poor, unlike Intel’s active sabotage of XO until they could compete against it,
brainiac-CPU & speed-demon-GPU, unlike the bizarre reversal embodied in the P4+NVidia paradigm,
helping FOSS honestly & directly
FOSS radeon drivers,
OpenCL,
Open64,
etc. )
Blessings & Excellence,
-Antryg
PS:
bonus-for-no-reason…
“The Definitive Business Plan, 2nd ed” ( Richard Stutely )
“Presenting to Win, 2nd ed” ( Gerry Weissman )
“Stein on Writing” ( Sol Stein )
“Corps Business: the 30 MANAGEMENT PRINCIPLES of the US Marines” ( Forbes senior editor David H. Freedman )
“Organizing from the Inside Out” ( Julie Morgenstern )
“Execution: the Discipline of getting things done” ( ?Das & Bossidy? )
“The Design of *Everyday* things” ( can’t remember the lad’s name
of course, “The 7 Habits of Highly Effective Families” ( Stephen R. Covey )
& finally,
The One Rule to Rule Them All:
When faced with a task/job,
Divide it into *dimensions*:
Trying to solve 2 problems superimposed among one-another, means one is likely to spend *MORE than 2x* as long fighting with it as one would-have, had one simply divided ‘em apart into orthogonal dimensions up front!
Cheers!
G34 is targeted at 2P and 4P platforms, but there will be 1P motherboards. Keep in mind that these products are targeted at server environments, so a lot of the “consumer” features that an enthusiast might want will not be supported.