With Windows 10’s release drawing ever closer, DirectX 12 news continues to pop up thick and fast. A few days ago, AMD revealed that the Asynchronous Shaders, (which is part of DirectX 12 specs) and the Graphics Core Next (GCN) Asynchronous Compute Engine architecture powering Radeon graphics cards will drastically improve performance of DX12 games by 46%. Since then, team red have revealed further information on how this works, and also additional information on how Multi-Threading functions in D3D12.
Asynchronous Shaders Explained
Essentially, modern GPU’s are a collection of hundreds of processors which work together to perform a task. In the case of AMD’s GCN architecture, 64 shaders (along with cache, and other components) form a single Compute Unit, and many CU’s form the basis of a single GPU. The R9 290X, for example, contains 32 Compute Units, meaning a total of 2048 shaders are available on the GPU. These shaders must be fed data, and this is handled by either by the single Graphics Command Processor (GCP) or the Asynchronous Compute Engines.
There are some subtle variances between GPU families – for example, some of AMD’s GPU’s have a greater number of ACE’s, or in the case of the Xbox One, it is confirmed to contain two GCP’s (mostly likely the second is for OS usage, rather than for games).
The Graphics Command Processor handles work from the graphics queue – in other words, what the games want drawn on screen, and the ACE’s handle Compute Queues. A greater number of ACE’s allows the GPU to process a greater number of compute tasks simultaneously, and therefore increase the efficiency of the graphics card, therefore improving performance.
The biggest enemy of GPU’s is latency – in other words, having ‘gaps’ in the the shader pipeline where the GPU is not doing anything. Developers have struggled with GPU latency for some time now, but really the blame wasn’t on the GPU, but more on the API (in this case, DirectX 12) which wasn’t good at handling tasks in parallel (at the same time).
Just like how D3D11 wasn’t very good at multi-threading on CPU’s, it also wasn’t very good at thinking in parallel for GPU’s, which is pretty insane, considering parallelism is the GPU’s strongest point. Instead, commands would be processed in a serial fashion, one at a time. The ACE’s would attempt to schedule work, even while the GPU was still handling a rendering task, and eventually that work would be carried out.
Problems arise when you consider that certain tasks have a higher ‘priority’ than other tasks, and you weren’t able to interrupt work that was being completed. Developers therefore started to push towards PreEmption, which allows tasks with a higher priority (which can be set manually or automatically) to go and be processed ‘first’, and less time sensitive tasks will be forced to wait until that work is completed. GPU’s handle this by use of Context Switching, PreEmption is often better than a ‘every task for itself’ approach, but that’s not to say it doesn’t have inherent problems.
Because DirectX 11 is essentially serial in its thinking, PreEmption can cause a lot of idle time as Context Switching (a Context Switch, at its most basic, is the processor saving results of one task, and switching to another to begin processing that task) occurs, and naturally this time where the GPU isn’t processing data is essentially wasted performance and can also create stutter in the frame rate (frame rate or frame time variance).
AMD believes therefore that DirectX 12 and Async Shaders (which are part of DX12’s spec AMD’s ACE are excellent at being able to leverage the advantages of DirectX 12’s Async Shaders and easily able to segment the workload efficiently.) counter this “by interleaving these tasks across multiple threads to shorten overall render time.”
According to AMD’s Robert Hallock “A developer doesn’t need to change the way they write their shaders to use AS [Asynchronous Shaders], so it’s relatively easy to extract gains on AMD hardware. It’s part of the core DX12 spec, so it’s not even something that needs to be specifically added to an engine. You support DX12, you have it.”
If you remember the first part of our analysis, we discussed how console games such as Infamous Second Son were already taking advantage of it, and PC titles were slowly too (Thief). Robert points out: “This is one of many cases where the consoles are improving the performance and flexibility of the PC.”