A growing number of titles relying on Asynchronous Compute has started growing since DirectX 12 was first announced. A new rumor has popped up that Nvidia’s upcoming Pascal architecture doesn’t handle Asynchronous Compute much better than Maxwell, and thus will likely still lag behind AMD’s performance.
Nvidia’s GTX 9xx series cards (which are based upon Maxwell) currently don’t handle Asynchronous Compute particularly well compared to the GCN architecture by AMD. This fresh report hints that Nvidia simply haven’t changed their compute backend significantly enough to change this, and will are basically hoping to use a more brute force approach for performance.
AMD’s GCN GPU’s can run commands in parallel, and in effect ‘splice’ compute commands in between graphics workloads. This means that commands don’t need to ‘finish’ and queues of commands can be smoothly submitted for processing without requiring another task to be finished. This allows the GPU’s shaders to be better utilized, and can be easily thought of as multi-tasking for the GPU.
This support is thanks to AMD’s ACE’s (Asynchronous Compute Engines) which handle this task on behalf of the GPU.
Nvidia’s support for Async. Compute isn’t quite as clear cut; instead in many benchmarks Maxwell has shown better performance with this option simply disabled.
A user on the Beyond3d forum constructed a benchmark designed purely to test Asynchronous Compute performance on the Maxwell hardware and he found:
“There were claims originally, that Nvidia GPUs wouldn’t even be able to execute async compute shaders in an async fashion at all, this myth was quickly debunked. What become clear, however, is that Nvidia GPUs preferred a much lighter load than AMD cards. At small loads, Nvidia GPUs would run circles around AMD cards. At high load, well, quite the opposite, up to the point where Nvidia GPUs took such a long time to process the workload that they triggered safeguards in Windows. Which caused Windows to pull the trigger and kill the driver, assuming that it got stuck.
“Final result (for now): AMD GPUs are capable of handling a much higher load. About 10x times what Nvidia GPUs can handle. But they also need also about 4x the pressure applied before they get to play out there capabilities.”
The user (Ext3h) claims that Nvidia’s hardware doesn’t benefit as much with Parallel execution due to the “gaps” in the shader utilization being less, which means combined with the fact Nvidia’s hardware needs to use context switching, performance suffers.
We’ll of course learn if this rumor is true once the GPU’s hit store shelves, and even if it is how much of a real impact it has on performance (Nvidia could have found a way to somewhat mitigate the problem). We know for certain that after the fairly lackluster FP64 performance from Maxwell, Nvidia have vowed to improve performance again for compute and offer FP32 at double performance. These changes alone could make the world of difference.
For AMD, the switch to Asynchronous Compute by developers is a large benefit. Because they designed the hardware inside the Playstation 4 and Xbox One, the fact developers are creating titles (such as Infamous Second Son) which rely on this new technology means it is more likely to filter to consoles.
For more information on what Asynchronous Compute is and how it runs (particularly on AMD hardware) then feel free to checkout this in-depth explanation.
Source of rumor: BitsnChips