For months we’ve been reading about the performance supposed performance benefits of DirectX 12 with no real way of testing anything for ourselves – and now finally Futuremark have released an update for their 3DMark benchmarking software. This update provides a way of testing ‘API Overhead’ and the results are rather startling; demonstrating precisely the reason why Brad Wardell and others in the games industry have been rather excited with the slew of new API’s which are being released.
If you remember, Futuremark’s involvement in the world of DirectX 12 isn’t new – they were the first Microsoft partner to show live code running on the new API back when it was unveiled in 2014.
The test is purely synthetic – so it’s important to remember that the results aren’t going to accurately mirror games (after all, real game engines require other pesky tasks such as AI, physics and for both the CPU and GPU to perform). The 3DMark API Overhead Feature Test is designed to test one thing – the number of draw calls the system can handle in a single frame of animation.
If you’re expecting a lavish test showing the camera sweeping through some frightening and delightful alien landscape you’ll be slightly disappointed. Instead, the API Overhead Feature Test presents you a lot of ‘blocks’ on screen. These unique blocks will appear at both the top and bottom of the screen (perhaps reminding you a little of the movie Inception) with ever increasing numbers.
The test is NOT designed to punish the GPU’s shaders or memory bandwidth, instead it squarely places the focus on two components – the systems CPU and the video cards Graphics Command Processor.
We’ve discussed Draw Calls (and pretty much all of the other DX12 features) a few times before – but in a nutshell the CPU must issue the command for the GPU to draw it; in other words the CPU is telling the GPU how to draw a block.
The GPU’s GCP (Graphic Command Processor) meanwhile must do something a bit different – it must accept those instructions and then issue them to the GPU. If you’re thinking “how bad can that be?” cast your minds back to March 2015’s GDC, where Max McCullen pointed out that the new API can actually flood the GCP and cause it to ‘choke’.
There are four API’s available for the test – DirectX 11 Single Thread, DirectX 11 Multi Thread, AMD’s Mantle API and finally DirectX 12.
DX11’s Multi Threading was never particularly stellar – and the benefits between using it and a single DX11 thread (as you’ll see in these tests) wasn’t particularly noticeable (in some cases, it’s actually a little worse!).
Testing Using API Overhead Feature Test
We decided to run four graphics cards for the test, AMD’s R9 290X, R9 285 and Nvidia’s GTX 760 and GTX 780 Ti. The drivers are the current latest beta’s available for the respective GPU as of the March 30th, 2015 (performance may change in the future, naturally).
For the benchmarking rig, we’re running the latest version of the Windows 10 Technical preview (for the same date as above) and we’re running with a Intel I7 4770K CPU, with a slight overclock to 4.2 GHZ.
We run the CPU in a variety of different configurations, including disabling 2 and 3 cores and disabling Hyper Threading. The idea is to keep all system conditions as identical as possible (such as IPC, memory performance, motherboard and so on).
The results (spoiler alert) are startling – with AMD’s R9 290X the single threaded performance (only 1 CPU core enabled and HT OFF) of D3d12 is about four times that of D3D 11 with all cores and HT available.
CPU CONFIGURATION | Radeon R9 290X DX11 | GeForce GTX 780 Ti DX11 | Radeon R9 290X DX11 MT | GeForce GTX 780 Ti DX11 MT | Radeon R9 290X MANTLE | Radeon R9 290X DX12 | GeForce GTX 780 Ti DX12 |
Four Cores + HT | 1,030,741 | 1,335,207 | 1,004,928 | 2,280,942 | 16,513,046 | 17,600,402 | 11,864,278 |
Four Cores + HT OFF | 863,541 | 1,281,425 | 978,970 | 1,252,871 | 13,625,674 | 14,000,385 | 12,240,232 |
Two Cores + HT | 936,445 | 1,211,657 | 876,738 | 1,284,330 | 9,698,150 | 10,177,860 | 8,846,758 |
Two Cores + HT OFF | 921,709 | 1,357,632 | 968,390 | 1,342,967 | 7,626,373 | 8,038,554 | 7,261,198 |
One Core + HT OFF | 555,387 | 657,969 | 542,083 | 634,110 | 4,139,469 | 4,156,122 | 3,688,793 |
As you’re able to see from the high end GPU testing, a rather curious pattern begins to emerge. Generally, in terms of raw draw calls, DirectX 12 has a slight edge on Mantle on the AMD GPU’s. While having just a single CPU core enabled massively impacts performance, it’s still rather startling how much better DX12 handles draw calls than DX11. We’re not suggesting that you’ll want to rush out and purchase a low end CPU, but clearly the API has undergone rather vast performance improvements and does a heck of a lot better than DX11 ever did.
What’s rather interesting is Hyper Threading (particularly in a four core configuration); with Nvidia’s GPU’s there’s a rather obvious loss in pure draw call performance. But with AMD’s Radeon range HT is a pretty hefty increase. Whether this is to do with architecture, driver maturity, testing conditions or a bug in Windows 10 is a bit of a mystery and will require further investigation – particularly when the first DX12 games finally land on store shelves.
CPU CONFIGURATION | Radeon R9 285 DX11 | GeForce GTX 760 DX11 | Radeon R9 285 DX11 MT | GeForce GTX 760 DX11 MT | Radeon R9 285 MANTLE | Radeon R9 285 DX12 | GeForce GTX 760 DX12 |
Four Cores + HT | 998,180 | 1,272,045 | 1,013,809 | 2,420,866 | 16,856,522 | 17,360,137 | 10,530,041 |
Four Cores + HT OFF | 1,030,972 | 1,355,963 | 991,537 | 1,487,011 | 13,602,690 | 14,506,703 | 10,839,023 |
Two Cores + HT | 540,529 | 1,156,490 | 678,352 | 1,279,794 | 9,571,247 | 10,355,599 | 9,095,858 |
It’s a pretty similar story with the mid range – the GTX 760 shows a tiny decrease with Hyper Threading enabled on the Intel I7 4770K, while the tonga based R9 285 says “thank you very much for the extra threads!”. DirectX 11 single and multi threaded benchmarks are also all over the shop – and aren’t that reliable. Re-running the same test can cause quite a fluctuation (how much depended on the configuration) but we went with the higher average of what we’d managed to achieve in our testing.
No matter which way you look at it, in these tests (and bare in mind we’re not running a GTX 980) AMD’s GPU’s are achieving a staggering 17x performance increase with DX12. Nvidia are sitting on between 10 – 12x – which is still rather large (their DX11 numbers are also slightly better – we’ve a feeling some of this is driver related).
The conclusions are pretty clear – running the synthetic API overhead testing on 3dMark shows a rather profound difference in API performance. It’s not exactly real world, but serves to illustrate to a wider audience (since you can try it yourself, providing you’re willing to install W10 beta and have the right hardware for it) the benefits of a new and shiny API.
We’re still going to be looking in-depth into Vulkan, that was meant to be done this week – but DX12 testing took a bit longer than anticipated because of a few technical snags. It’ll be interesting how Vulkan compares, particularly when one considers that it does offer developers a rather tantalizing bonus of being able to develop easily for multiple OS easily. A shortcoming of DirectX 12 in particular – develop a game for it, and you’re pretty locked into the Windows ecosystem.
With game engines (such as Unity) already adding in support for Microsoft’s new baby, and news we’ll have DX12 games before Christmas industry support for the fledgling API is pretty much assured.
How performance will really be impacted (and visual fidelity) in real games is something we’ll have to wait for. This is nothing more than a benchmark at the end of day – a synthetic one at that. With that said, it’s very difficult to not get very excited at Futuremarks 3DMark API Overhead test… if only as a glimpse into the future.