The age of the APU is firmly here – and for low power devices, it brings a plethora of various benefits.
An APU (or, to give it its full name Accelerated Processing Unit) integrates the CPU and GPU on one die. The idea behind this really started to hit home after AMD bought out / merged with ATI. AMD then started a new project by the name of fusion. The concept behind this merger is to reduce the number of components, along with heat and power, while bringing performance up along with it.
AMD Fusion (as it was formerly known) was rebranded recently to AMD Accelerated Processing Unit. It’s a type of SoC (System on Chip). For home PC systems, there are obvious issues with this – including limiting the upgrade path, however, for cheaper systems, laptops and other lower power devices, an APU starts to make increasing sense.
It’s becoming increasingly more common tasks are being given to the GPU that aren’t just graphics. The nature of a GPU is such that it’s extremely good with parrel calculations. Certain applications on your PC take advantage of these heavily. Video editing applications, such as Adobe Premiere, are a fantastic example of the GPU being used for more than just game graphics’. Hardware Physx (using Nvidia’s Cuda technology that you swould see on say their Geforce cards) are another.
Sony’s Playstation 4 APU has been constructed to allow the GPU to help the CPU out with certain calculations when required, via use of the various GCN cores. The PS4 sports 18 of these GCN cores, which combined put out around 1.84Teraflops of compute power. The CPU part of the APU package is the weaker side of things. The AMD Jaguar uses 8 cores, clocked at 1.6GHZ. The Jaguar is itself, based on a redesign and improvement of AMD’s Bobcat.
The APU drastically improves data flow, eliminating the need for the data to ‘flow out’ and around the motherboard. It allows huge amounts more data to be accessed by the GPU/CPU. The thoughts behind it are simple – to combine a general purpose CPU cores with programmable vector processing engines. Or, to put it into another way – the PS4’s APU will be based on technology that ties CPU Scalar processing with high performance Parallel Processing.
In the AMD APU solution (so that would be for example, the Playstation 4’s Jaguar), a fully programmable X86 (although the PS4 version also combines X64 instructions too) CPU with the vector processing power of a GPU is tied together on a single piece of silicon by a super high performance bus. They are both are able to access the RAM (in the Playstation 4’s case, the 176GB/S of the GDDR5). The Silicon of the APU generally contains other important system elements, for example the I/O controller and memory controllers. A lot of the functionality normally placed on the northbridge, thus reducing the bandwidth needed to shuttle data around the board.
Vector processors have been built into GPU’s for awhile now, and are made up of thousands upon thousands of compute cores. These compute cores (known as shaders) each can operate simultaneously and can be fully programmed.
The Playstation 4’s Jaguar APU is the most complicated that AMD have built so far – and AMD are planning to sell cut down versions of the core to computer vendors (although currently, they’ll only be 4 cores. Also, obviously lacking the Sony specific changes).
Some numerically intensive problems lend themselves to parallel algorithms, and others don’t. When a machine optimized for parallel computation encounters a problem that cannot be computed in a parallel manner, the machine operates as an inefficient scalar processor, and most of its parallel computing resources sit idle. Conversely, a processor optimized for scalar calculations cannot exploit the parallelism in many algorithms, and thus is limited by its scalar processing speed.
As said early, the playstation 4 uses the Jaguar APU. The Jaguar offers numerous improvements over the Bobcat core used in the company’s existing “Brazos” platform. Jaguar not only doubles the amount of cores, from two to four, but also the size of Level 2 cache to 2MB, and it makes the cache shareable among all the cores.
What are Scalar CPU’s?
Scalar Processors (so, the CPU “Jaguar” part of the APU inside the Playstation 4) operate on arrays of data, element by element. let’s say an application needs to say add a 1000 elements in vector A to a separate list of 1000 in Vector B, and then store the combination of these results in Vector C. It will generally set pointers to each of the vectors, and load the ‘values’ pointed to by both A and B into the CPU registers. It will then add those together and then store them in the location of C. However, to do all of this it would require the CPU to complete this work 1000 times!
So what about Vector Processors – like in the PS4’s GCN Units
Advanced GPU’s have hundreds of units available to calculate instructions, and can all operate simultaneously. Let’s assume the same problem as before, and the application in question wants to add two one-thousand element vectors using just say 10 of the processing units on the GPU. The software will then send the work to each of the units at the same time, so they’ll all be working to calculate the results in around a tenth of the time that the CPU would… but like most things in IT, it’s never quite that easy.
You’ve also got to consider several other variables – such as the time it takes to set these operations up to begin with (after all, it’s not just magic that things work), time taken to ensure that the calculations have been done, and correctly too, and the time taken to actually move the data around from the Scalar CPU’s RAM and the RAM of the GPU. In other words, latency introduced from the simple act of the two devices telling each other what the other one is doing.
Because APU’s combine the two together on the same die, the APU doesn’t have these issues. Sony have worked with AMD in constructing an absolute monster of a die. The Playstation 4’s APU will be able to move data around extremely quickly, with the GCN cores tightly integrated with the CPU, both with direct access to the same memory controller. In the case of the Playstation 4 (which obviously will require plenty of bandwidth to keep the CPU and GPU satisfied) – the systems designers went with GDDR5 RAM. The CPU and GPU will both be able to be used to compliment the weaknesses of the other.
What about non Playstation 4 uses though? Well, eventually we’ll see Accelerated Processing Units becoming more and more common place. Discrete graphic cards aren’t going anywhere in a hurry though – and they’ll likely be easily integrated to perform calculations with games. Right now, we’re still a long way of getting to the point of full integration – a lot of this technology is still early, the libraries and software however to harness the potential are getting better.
Games will benefit a lot from this, far better physics, improved rendering, better video out put and more. The playstation 4’s CPU could possibly have stood to be a little beefier, although the integrated APU SoC (System on Chip) design will help – a lot.