AMD are keen to point out the benefits for HUMA (Heterogeneous Unified Memory Access), and there has been recent comments that it is also present in Sony’s PlayStation 4
and Xbox One games consoles. There have been many questions raised as to what HUMA is and how it will benefit the performance of the consoles, and future APU’s.
Before we discuss HUMA further, let’s discuss the Playstation 4 and Xbox One’s role in all of this. It was first announced by Marc Diana (a senior product marketing manager at AMD) that the Sony PlayStation 4 contains HUMA, and that the Xbox One lacks Heterogeneous Unified Memory Access. This was quickly corrected by AMD (Advanced Micro-Devices) who pointed out this information was inaccurate, and Marc Diana had made errors. Further, they added that they would not comment on devices built for their customers. In this case – their customers are both Microsoft and Sony. Sony and Microsoft both sought AMD’s help for the architecture which powers their respective consoles.
As a quick run down, both use a custom APU (which contains the AMD Jaguar CPU – with 2×4 core modules, along with AMD’s own GCN GPU technology). Both systems have their own modifications, for example – Mark Cerny (the lead architect behind the Playstation 4) pushed on some of the changes they’d made, such as the improvements to the compute functions of the GPU for the PS4, or the changes to the memory bus. For Xbox One – it was the 32MB of ESRAM, and the ‘Move Engines’ to help shunt around large pools of data with minimal processing usage. In the below examples,
Heterogeneous Unified Memory Access (known as hUMA) is the next stage of evolution of Unified Memory Access. For the purposes of explanation on what this is, for a moment we’ll forget about the caches which are present on both the CPU and GPU portions of the AMD Jaguar, and instead focus on main system RAM. In the below examples – please note that other buses (such as CPU and GPU communication) aren’t shown, instead this focuses only on the memory chunks.
Microsoft’s prior console, the Xbox 360 used UMA, where as Sony’s Playstation 3 didn’t. The Xbox 360 had 512MB of total memory available, and developers were able to split this memory however they’d like. There is a portion of memory allocated to system functions (in other words, reserved memory) but for the purposes of this article, to make things easier to understand – we’ll pretend it doesn’t exist. The Playstation 3 had 2x 256MB memory sections, that is 256MB for graphics, and another 256MB for ‘game data’ – for example, sound, game engine information, AI and so on could be stored right there. The GPU portion would of course handle things as textures and other materials such as lighting which help render the scene. In other words, it’s a very ‘cut and dry’ world. If a games developer only needs 100MB of graphics ram, and uses up 256MB of system ram (and wishes they had more) too bad, there’s nothing they can do. The rest of the graphics memory is essentially ‘useless’ or wasted.
This is a very similar system to current PC’s. A dedicated GPU typically slots into the motherboard (into a PCI-E slot) and that has its own local memory (in other words, RAM on the video card itself). But, remember that in the case of PC’s, you’re dealing with much larger numbers in terms of RAM – for instance, 2GB of video memory and say 8GB system ram. This means that games developers don’t need to worry about how the memory is ‘sliced up’ because they had so much of it to play with anyway.
The Xbox 360 uses Unified Memory, which has one large memory pool which can be split up however its needed. In other words, it’s down to the games developers to say “oh, okay – this needs 300MB graphics, and we only need say 200MB for system”. It’s a much more open system. It does have some limitations however, and this is where Heterogeneous Unified Memory Access comes in.
The memory is still separate blocks of data. Notice above, that the green ‘memory block’ clearly is for the GPU, and the blue for the CPU. If data resides in the CPU block and the GPU needs to access it, it must be copied from one to the other (or vice-versa). This means that effectively you’re wasting time / space with copies of data. With a HUMA design, the GPU and CPU can both process data – so in effect, if the CPU finds itself struggling and thinks that the GPU would be better suited to a task, it can ask it to help out.
This is very important, because modern day graphics cards are capable of processing far more data than what a traditional CPU can. GPU’s are much better work horses for compute, and are capable of several tflops of computing power. For instance, the PS4 GPU is capable of around 1.84 TFLOPS. This is slow compared to high end desktop solutions (such as AMD’s own AMD Radeon 7970) but is still light years ahead of the 350GFLOPS or so that a high end modern Intel CPU (such as say an Ivy Bridge or Haswell) can produce. To put it another way, the CPU is very smart – but weak / slow at doing certain tasks. The GPU is dumb as a post, but fast / powerful as hell – and so a marriage between them makes perfect sense.
AMD have illustrated what it feels the ‘top’ benefits for HSA are:
- Much easier for programmers
- No need for special APIs
- Move CPU multi-core algorithms to the GPU without recoding for absence of coherency
- Allow finer grained data sharing than software coherency
- Implement coherency once in hardware, rather than N times in different software stacks
- Prevent hard to debug errors in application software
- Operating systems prefer hardware coherency – they do not want the bug reports to the platform
- Probe filters and directories will maintain power efficiency
- Full coherency opens the doors to single source, native and managed code programming for heterogeneous platforms
- Optimal architecture for heterogeneous computing on APUs and SOCs.
So in other words, now the CPU simply points to the GPU to the SAME piece of data, as there’s no copy operation which happens – and the GPU processes that data. The CPU can then read and use the results calculated – which when you consider the PS4 has numerous changes specially for the purposes of compute, means it’ll have some interesting options ahead of it.
Other Articles You Might Like
- Exclusive AMD Interview – Discussing GCN Architecture, Compute Performance & The Future Part 1 (9)
- AMD TressFX 2.0 – Capable Of Doing Much More Than Hair (9)
- Why PS4 and Xbox One Moved to X86-64 (9)
- Exclusive AMD Interview Part 2 – Mantle Performance, ROP & GPU, Steam Machines & More (8)
- Beyond The Power – What Next Generation Gives You (8)