AMD & Microsoft Announces Heterogeneous C++ AMP Language


AMD (Advanced Micro Devices) are working in collaboration with Microsoft in development of C++ AMP version 1.2. The recent announcement of the open source C++ compiler is an open specification, meaning developers on either the Linux or Windows platform can leverage the power of C++ AMP.

This release is a milestone, servicing as a step forward for AMD in their support towards the open source community, and the tool is built to harness both LLVM (which despite the name isn’t really to do with traditional Virtual Machines, but instead   LLVM Project is a collection of modular and reusable compiler and toolchain technologies) and Clang (A compiler frontend for C, C++ and other C based languages). In theory, this will allow developers ease of creating applications and software to make use of heterogeneous platforms, which include not just the modern desktop, but also PC’s, servers and handheld devices.

“AMD has a consistent track record of enriching the developer experience, and we’re proud to make the first open source implementation of C++ AMP available to enable greater performance and more power-efficient applications,” said Manju Hegde, corporate vice president, Heterogeneous Applications and Solutions, AMD. “The cross-platform release is another step in strengthening AMD’s developer solutions, allowing for increased productivity and accelerated applications through shared physical memory across the CPU and GPU on both Linux and Windows.”



“AMD continues to deliver excellent developer tools for heterogeneous programming. Partnering with AMD to deliver C++ AMP to the Linux and Open Source communities was a natural step for Microsoft as we work to improve the performance and developer experience on modern computing platforms,” said S. Somasegar, corporate vice president of the Developer Division at Microsoft.

The crucial thing is that it’s not just AMD hardware specific (in other words, this isn’t say like Mantle) but instead can be used on competitor hardware, including Intel CPU / APUs and even Nvidia’s GPU hardware. The key here is that it must be OpenCL compliant – if it is, then you’re almost certainly good to go.

As some of you might be aware, one of the key advantages Heterogeneous architectures boast is their shared memory architecture (for example, like AMD’s own HUMA). These technologies are present in modern day consoles and the basic principal is fairly simple to understand – the memory is shared by both the GPU and CPU. This ensures that both the CPU and GPU can access the same data whenever is required. While there is still some latency associated with ‘unlocking’ the data (in other words, the CPU telling the GPU ‘okay, you can work on this now’, it’s still far faster than the traditional memory architecture, especially when it comes to compute functions.

With DDR4 set to become more mainstream, the additional memory bandwidth will certainly be crucial for servers and other low powered devices. High end GPU’s will of course continue the more traditional separate memory pools due to their high memory bandwidth requirements (usually GDDR5 and wide memory bus – see DDR3 vs GDDR5 analysis here). AMD’s C++ AMD 1.2 will now fully support this, allowing a greatly simplified programming and sharing of data between CPU and GPU.

Playstation 4 HUMA memory architecture example.

  • Khronos Group OpenCL, supporting AMD CPU/APU/GPU, Intel CPU/APU, NVIDIA GPU, Apple Mac OS X and other OpenCL compliant platforms;
  • Khronos Group SPIR, supporting AMD CPU/APU/GPU, Intel CPU/APU and future SPIR compliant platforms; and
  • HSA Foundation HSAIL, supporting AMD APU and future HSA compliant platforms.

It’s actually very interesting because with the arrival of DirectX 12 next year (the date of course is still up in the air), and rumors say that it will indeed feature more robust compute support. With that, and Valve’s commitment to Linux gaming, virtual reality all happening this does tie in nicely with C++ AMP from AMD and DX12. DX12 will better leverage multiple CPU cores and in theory this will indeed require more memory bandwidth, as the rendering can be more evenly split over multiple CPU cores.

It’s a very exciting time, and we can clearly see that limiting hardware and software in gaming and general computing (such as DX11’s limitations or DDR3 memory bandwidth) will finally start being resolved. This will tie in rather well with AMD’s rather interesting open source OpenCL C++ AMP version 1.2.