Microsoft’s GDC 2014 announcement of DirectX 12 came as little surprise to those in the tech industry, MS had boasted and teased about its released in advance after all. Rather, the surprise came with just how similarly it seemed to resemble AMD’s Mantle technology, and that it would work on current graphics cards. If you’ve got an AMD GCN based GPU, or a Nvidia based Fermi graphics card – you’ll be good to go for at least some of DX12’s most important features – the low level access which aims to lower the associated CPU overhead.
Microsoft are still keeping many of the features of DX12 close to its chest, including which Operating Systems for the PC it’ll work on. But if we become technology equivalents of Sherlock Holmes we’re able to piece together enough clues to understand many of the basic concepts. Clearly low level access to the GPU is important – but let’s explore the why before the how.
Why is DX12 Focused On Low Level API Access?
If you’ve been following the graphics card industry since the later half of last year, there’s a good chance you’ve heard of AMD’s own answer to low level access – Mantle. Prior to that, the subject of lower level access to hardware was always a “that’d be nice” wish. Several times I’d remarked in my own videos and articles that an improvement on the PC’s API was required, but it was a fairly hushed subject. PC’s have typically simply overpowered the problem, but in recent years this approach isn’t as efficient as it once was – primarily for reasons of multi-threading and lower IPC from CPU’s.
Console’s such as the Xbox One and PS4 have it fairly easy – if you buy a PS4 now and then your friend Bob buys his two years from now, the core specs of the system won’t change. Sure, he might get a larger hard drive, or possibly a smaller form factor, but the raw performance of the machine simply won’t change. The software, drivers and operating system will be better optimized, but the hardware itself remains a fixed spec. This means that the API (Application Programming Interface) can be written in such a way to focus on these specs. In short, this means that low end (compared to their PC counterparts) hardware can perform far more effectively than a similarly specced PC could.
3DMark is multithreaded (hence why users of CPU’s with either Hyper threading or more cores score higher in CPU related tests). But due to extensive overheads which are occurred by runtimes and drivers there’s lots of idle time per core.
PC’s traditionally work on two different API’s – either Microsoft’s DirectX (and therefore, Direct3D) or OpenGL (Open Graphics Library) which is developed by the Khronos group. OpenGL of course can work on a variety of different platforms (Windows and Linux for example) where as D3D focuses on Windows (and now of course the Xbox uses its own version too). The thing of it is, these API’s are higher level than consoles, basically forming an abstraction layer. The game interfaces with DirectX, which then in turn interfaces with the graphics driver, which in turns speaks with the GPU and tells it to “do stuff” on screen. The last time we saw a lower level API really was 3DFX’s Glide API, which was specifically written and created for the Voodoo graphics card. Popular games at the time (for example Quake and Tomb Raider) would release patches to enable Glide on them.
This higher level approach allowed a huge variety of different hardware and drivers to co-exist in the same system and reduced compatibility trouble. In other words, if you had a Nvidia graphics card and your friend a AMD graphics card, the game would work on both of your machines without too much fuss. The chip creator could then optimize their drivers to the best of their ability to improve performance on the specific title, which clearly was extremely important when it came to AAA games, which would often shift new and expensive graphic cards.
This approach is great but it does have draw backs which are more frequently rearing their ugly head. CPU’s of today aren’t making the same performance strides they were several years ago, particularly on a per core / thread basis. IPC (instruction per clock) improvements are vastly slower than they were, and simply increasing clock speeds isn’t something that you can do forever and a day. Intel found this out the hard way back in the days of the Pentium 4 architecture, when they’d originally planned (or at least boasted) that the Netburst architecture could run at 10 Ghz. In reality, it achieved roughly 40 percent of this number. Power leakage, heat and other factors simply stopped Intel from achieving this goal.
If we examine modern day CPU’s, for example the Sandy Bridge to Haswell, despite there being a gap of around four years, the architecture hasn’t vastly improved. Sure, Haswell has additional instructions, such as AVX2, but in terms of performance it’s only roughly 20 percent better than Sandy in most applications. So CPU’s have had to find improvements elsewhere – mostly in becoming more parallel in execution by handling more threads and in turn the addition of more CPU cores.
Meanwhile, GPU’s aren’t suffering from this problem, and have continued to grow rapidly in performance, with modern GPU’s handling many TFLOPS of data. To put this into perspective, the total CPU performance of the PS4 is a smidgen over 100GFLOPS (possibly up to around 110TFLOPS depending on clock speed). Meanwhile, the PS4’s GPU puts out the much larger performance number of 1.84TFLOPS. It’s further compounded because the vast amount of work that the CPU does when it comes to processing data for the GPU is prepare jobs for it, and much of this work with DX11 and other higher level API’s is fairly single threaded. In turn this means you’re left with a single CPU core which is often doing far more work, and in reality this is slowing down the performance of the game.
Clearly lower level API’s already have the advantage in being ‘lighter’, with less translation needed from the games original code to the API and eventually the graphics card. But additionally, it’s far easier to grip the bull by the horns and write rendering that’s more multi-threaded in nature. So we PC gamer’s are therefore starting to live in an irony – massively more powerful GPU’s, far higher single threaded performance, but a PC simply cannot handle the same amount of draw calls as a console, which often costs at most half the price. This is due to the lower level API, but also because the games are pushed to be multi-threaded in a much more efficient way. For an example of this check out our Naughty Dog PS4 technology breakdown.
While Mantle is certainly a compelling option – particularly as its on the market now for games developers, unlike DX12 which we’ll have to way until the later part of next year for, there’s few who’ll deny that DX12 will have the greatest impact on many games developers. It’s the API most developers are used to coding in for PC, game engines readily support it – and it doesn’t hurt anything it shares a few similarities with the Xbox One API either.