Investigating Core Count Scaling and DX12 Vs DX & Vulkan | Analysis

The excitement surrounding the launch of DirectX 12 and Vulkan seems so very long ago, these modern creations designed to lower the overhead on the system, recruit more CPU cores into running game logic and bring new features to the table. DX12 was first announced in March 2014, and debuted just over a year later alongside Microsoft’s Windows 10 OS (yes, Windows 10 launched in mid-2015).

Back then, Intel’s Haswell architecture was current, and their 6th generation Skylake architecture launched a little after DX12’s debut, with the I7-6700K and its ilk hitting store shelves in August of the same year. Of course, the mainstream Skylake CPUs only featured 4 CPU cores (and 8 threads in the case of the I7-6700K), a core and thread count that’s dwarfed by modern CPUs from both AMD and Intel.

In this updated article, we’ll be investigating the impact of core and thread count in modern titles with different APIs – DX12, DX11, and Vulkan. Last year, I did a similar video and article featuring different games and used the 6 core 12 thread I7-8700K for my investigations, along with both a GTX 1080 and a GeForce GTX 1080 Ti. But with this updated project, we’ve upped the ante considerably and testing with both AMD and Nvidia hardware and a lot more titles.

The push for low-level APIs was really spurred on by AMD’s Mantle API, developed specifically for their own Radeon hardware. When the API launched we did some testing the then-current cards such as the Radeon R9 280, and even on mid-range GPUs, there was a tantalizing glimpse into the performance increase from a modern API.

Since then for the PC there are two low APIs that have been developed, DirectX 12, pioneered by Microsoft and the successor to DX11 and Vulkan, curated by Khronos Group and acts as a successor of sorts to OpenGL. DX12 runs across the Windows environment only (so Windows 10 and Xbox One), while Vulkan is platform-agnostic, and runs on even older versions of Windows (so Windows 10 or say 7), and other supports other Operating Systems like Linux.

An API (Application Programming Interface) is extremely important because its job is to facilitate communication between the software (so a game for example) and the hardware itself. Lower-level API’s such as DX12 reduces the overhead of legacy APIs such as DX11, and have less abstraction and much better at taking advantage of more threads and CPU cores. Though, the APIs accomplish this with a few trade-offs.

The first is that the developer needs to do more work compared to writing DX11 code, and if a game doesn’t need the additional performance it’s quite frankly easier to just go the DX11 route. This means that Multi-GPU is much harder to achieve than with DX11, where an SLI profile could easily be created. It’s one reason multi-GPU is rarer these days, though DX12 can support MGPU and indeed can even mix and match GPU configs (so you can have an RTX 2080 Ti and an Intel IGPU in the same system doing work for example).

In the closing weeks of 2019, there are still lots of newly released games that rely on the use of DX11, and it will be curious how things change with the release of the next-generation consoles in about 12 months from now.

A few notes about the testing methodology – firstly, the hardware setup itself.

I’m using an Intel I9-9900K (provided kindly by Intel) that is locked at 5GHZ across all of its cores, and then we use the MPG Z390 Gaming Pro Carbon (kindly provided by MSI) motherboard to enable or disable cores along with Intel’s Hyper-Threading. So, for example, we can disable Hyper-Threading to get a more ‘I7-9700K’ kinda performance, with the clear caveat that the additional L3 cache is still in place. But, for the sake of a nice consistent test platform, and the challenges associated with procuring and swapping out all of the chips which would be required, we think this is a nice compromise. We also recently did a more in-depth investigation into if Intel’s Hyper-Threading technology hurt gaming performance, so if you’d like to look at those benchmarks check them out here.

We are also testing with two graphics cards – the first is an Nvidia GeForce RTX 2080 Ti (thanks to our generous Patreons for helping us keep our equipment updated). I’ve gone ahead and increased the Power Limit to 111 percent (the highest our Gigabyte Windforce model goes) and played around with the fan curve a little. When the GPU is at full load, it hits between 1900 – 1930MHz on the core clock, but I haven’t manually overclocked it with any offsets (either to core or memory).

The second card is an AMD Radeon RX 5700 graphics card (check our full review of Navi 10 here), which was running at ‘stock’. This reference card was provided by AMD for testing, although given the reference design of the GPU custom cards will likely be capable of boosting to higher speeds, and of course, the RX 5700 XT would be faster still. Using the AMD Radeon RX 5700 in these tests will allow us to see how core counts and APIs affect AMD’s Navi 10 silicon.

We will benchmark with all 3 of the most popular resolutions (1080P, 1440P and finally 4K), though putting most emphasis on 1080P as we’re also attempting to analyze the role the number of threads has to performance. I have also included a few 720P results too with particularly demanding titles such as Remedy’s Control.

Other than all the hardware side of things, the games have been patched to the latest available using whatever distribution platform they’re on (such as Steam or Epic Games Launcher) and, finally, Windows has had the usual patches applied as of November 2019. In the case of footage that was captured (for the accompanying video), a second rig was used along with a dedicated capture card – so no additional load was put on the system. We will also be testing with a mixture of built-in benchmarks and manual runs.

On benchmarks that are run WITHOUT the GPU model being specifically stated, those results will be for the GeForce RTX 2080 Ti, but the results with the RX 5700 will be in the chart title.

The last caveat I’d like to discuss is our decision to overclock the Intel Core I9-9900K during our benchmarking. The base frequency of the CPU is 3.6GHz, and with one or two core active the CPU can turbo up to 5GHz. However, this figure changes when threads start spilling over more cores – for example, loading all 8 cores of the CPU nudges the turbo down to 4.7GHz. A clock speed that’s hardly a sedate pace, but also not the same speed. I made the decision therefore to simply lock the CPU down to 5GHz.

This provides predictable clock frequencies no matter the threads and turns the tests into a question of threads and cores up against different APIs. Although, there is an argument to be made that running at ‘stock’ would be a better indicator of how a CPU performs in its default behavior, particularly with chips that aren’t unlocked. In the end, there was no right answer, but I decided to go for predictable clocks and focus on core count scaling.

With that all out of the way, let’s start the benchmarks.

3DMark’s API Overhead Test is first, and this test was first implemented in 2015, in the infancy of DirectX12 to compare the then-fledgling API to using legacy DX11 code. Also back then, AMD’s Mantle was still ‘a thing’ before AMD stopped development on it and offered it as the basis of Khronos Groups Vulkan. Speaking of Vulkan, 3DMark’s API Overhead Test has since been updated to support the new Vulkan API.

Right off the bat, there are interesting results for the API overhead test. Enabling or Disabling Hyper-Threading on the I9-9900K doesn’t really benefit performance that much on Vulkan, though there’s a little extra squeezed out of the DX12 results. On the lower core count configurations though, 2 cores with HT just lacks the grunt of 4 ‘real’ cores. In DirectX 11 single thread, the results barely change with the extra available threads (not surprising really). DX11 Multi Thread does its best to make use of the extra resources, gets stomped by both modern APIs.

I have decided to include Ryzen 7 3700X vs I9-9900K results with our RX 5700, just so you can see that the CPU does extremely well in this test. We’ll investigate more with Ryzen 3000 stuff soon, but you can see that the story is quite similar with the RX 5700 as the RTX 2080 Ti, though DX11 MT and ST results are essentially awash (margin of error).

The next synthetic is UniEngine’s Super Position… and no, I didn’t make a mistake during benchmarks. The is entirely focused on pushing the GPU in the system as hard as possible, and your CPU can go sit down for a coffee while its running. 1080P Extreme is more taxing on the GPU than 4K Optimized (with the exception of VRAM), so there’s a little more variation when going down to just a single or dual CPU core configuration (I think of the I9-9900K with a single CPU core running as a Pentium 4 that’s been to the gym).

Ashes of the Singularity was one of the first games confirmed to support Microsoft’s DirectX 12 API, and later on, Escalation was released and then subsequently patched to also support Vulkan. In the first benchmark, we see how Vulkan and DirectX 12 perform against one another at 1080P. Vulkan is a slight win for this game, in both CPU and GPU results, no matter the number of active threads or CPU cores DX12 slips a little behind in the benchmark.

Testing with AMD’s Radeon RX 5700 with the built-in GPU test, we have all 8 cores and 16 threads enabled and simply looking at the FPS using the respective APIs. DirectX 11 doesn’t give a ‘CPU average’ result here, hence why the result is missing.

The Radeon RX 5700 puts both Vulkan and DirectX 12 up against once another in Ashes of the Singularity Escalation using the built-in CPU benchmark.

For AOTS then, Nvidia’s hardware seems to come out slightly on top with Vulkan, and AMD’s hardware gives a slight nod towards opting for DX12.

Batman Arkham Knight – one of the oldest games in the test suite. The RX 5700 has plenty of VRAM and GPU grunt compared to the cards available at the time of the title’s launch, so the benchmark runs pretty darn smoothly.

Now onto BattleField 5, we choose the highest quality in-game settings (with Ray Tracing disabled for obvious reasons). Let’s look at how the game scales across available CPU cores and threads first.

Let’s now focus on how the game performs on both DirectX 11 vs Direct X 12, using with the full 8 cores of the I9-9900K enabled, but toggling Hyper-Threading. In one graph, it’s easy to see BF5 is one game you just don’t want to run in DX11 mode, averages and max FPS is considerably higher with DX12 enabled. This game though is one title that definitely doesn’t like HyperThreading. We did some testing on this with our ‘Does HyperThreading Hurt Gaming Performance’ article and video, and BF5 was one of the few games which exhibited this behavior.

And what about AMD’s drivers? Well, a similar story. Even with the RX 5700’s more modest performance, DX 12 is consistently a better choice than the older API. 116FPS for DX12 with just 4 cores and 4 threads versus the 87FPS for the average, and a 5 FPS lead in minimum.

Blair Witch is a 2019 release which uses the always popular Unreal Engine 4 to great effect, and the title also hits the GPU much harder than you might expect, testing our testing CPU performance scaling at just 1080P. I decided to add in the 1 percent low here along with the min FPS, and you can see that even dropping down to just two CPU cores with hyperthreading disabled, performance didn’t get radically impacted. There’s not a whole lot to say about Blair Witch’s benchmark result that isn’t evident in the performance data.

Civilization 6 also has a few built-in benchmarks, but we use the ‘Turn Time’ which… well, measures how long it takes for AI to make their turns. It’s a nice and consistent benchmark, and we can use it to benchmark Civ 6 to see if its better in DX11 or DX12. DX12 comes out on top here, consistently even in extremely low core count scenarios. We set the game to have a lower system memory foot print and GPU foot print (example, just 1080P with lower AA) so that the GPU is what gets hit hardest.

Remedy’s Control is up next, using the ‘Northlight’ engine for its fancy graphics. We’ve totally disabled Ray Tracing for this set of tests, because Control already hits graphics cards with a sledgehammer at lower resolutions. Our RTX 2080 Ti hit 100 percent GPU usage throughout the tests – and given we were testing the beginning section of the game you can see that we didn’t really put much strain on the CPU.

With the above results, we look at the average performance of 2 areas new the start of the game. Consistently, DirectX 12 loses in performance to DirectX 11 here. Of course, with DX11 you cannot enable shiny effects like Ray Tracing and Nvidia’s DLSS (which weren’t enabled for our DX12 tests).

Using the built-in benchmark utility of Far Cry 5, we cranked up the quality settings to their highest and then let the relatively short flyby do its thing. Far Cry 5’s Dunia engine really loves single-core performance (as does older entries in the series), and there is no FPS advantage to higher core count processors at all really. Even an Intel I5-6600K would provide enough grunt in more GPU constrained scenarios.

Gears 5 is our next test, which of course was built from the ground up to support both Windows 10 and DirectX 12. We’ve a mixture of manual runs (using initial run when you get the Skiff) along with the built in benchmark. At 4 cores or less, you’ll definitely notice micro-stutter creep in and spoil the show, so you’ll want at least 4 physical cores in an ideal world. We attempted to test with only 2 CPU cores active (HT disabled) and found the game wouldn’t load.

Using the built in benchmark we can see how the performance of the game scales across resolutions and core counts. Just like in the last test, we disabled a minimum FPS target, because otherwise, the game will use dynamic resolution scaling to reduce GPU workload when things get extra tough. As you can see, even the RTX 2080 Ti struggles with this game at 4K.

Same test conditions as above, but switching things to AMD’s Radeon RX 5700. Because we had less time with our review sample (we needed to do both a full review, an all AMD built plus a few other soon to be upcoming projects) we couldn’t test it with so many different CPU configurations as we did our Nvidia GPU (once again, thanks again to our Patreon’s who help us buy samples with your support). The built in benchmark illustrates the performance impact of dropping to just 4 cores for the I9-9900K versus its default fully enabled state. 4K is just outside the scope of what’s possible to achieve on an RX 5700 class card over 30FPS too.

For our testing on Metro Exodus, we used the built-in benchmark and cranked the in-game quality settings to their highest – with the obvious exception of Ray Tracing. We decided to throw in 720P testing here since the game is so punishing. The results are pretty curious – for our RTX 2080 Ti, 1080P has slower results when DX12 is enabled versus DX11, and this is echoed different core configurations too. But when running Metro Exodus at just 720P, the benchmark nudged in favor of DX12, with a 10 FPS advantage with the 8 core / 16 thread configuration.

Using AMD’s Radeon RX 5700 and DX12 pulls ahead, even at the super punishing 1440P resolution, showing that API and optimization can be so so critical on GPU bound situations. There is still room left in the RX 5700’s tank, as we shown during our overclocking and tweaking guide, where we can get Metro Exodus’ performance to virtually lock to 60FPS when cranking the performance dials of Navi’s 10’s silicon to the limits.

The Tomb Raider titles are staples in our test sweet, and Lara’s latest outing “Shadow of the Tomb Raider” is no different, and we use a mixture of built-in benchmarks and manual runs to give us our results. We’ll start things with the RX 5700, all of the settings are cranked to their highest – but we disable Ray Tracing for the obvious reasons. The built-in benchmark results for the RX 5700 show a radical dip in performance for the DX11 code. Minimum FPS performance is what really suffers here, with the average FPS not really changing much between the two APIs.

I will admit, the announcement of Capcom remaking Resident Evil 2 had me super excited, and it’s amazing just how good this game looks and its performance. There’s no performance utility in the game (an annoying trait of many Capcom games), so instead we need to do a manual run. As you can see, with DirectX 11, the game runs with only a single CPU core! Albeit with terrible stuttering and instability (we had the game crash at least once). 2 physical cores ‘helps’ a bit, but it’s still a train wreck to play. The difference with 2 cores 2 threads to enabling Hyper-Threading is huge, and the min FPS literally increases by about 6x.

With DirectX 12, the RE2 wouldn’t even load with fewer than 4 threads available.

Looking at the DirectX 12 versus DirectX 11 results, and the game definitely runs better in DirectX 11 mode with Nvidia hardware, especially on lower core count processors. We were testing Resident Evil 2 with its graphics settings at their highest, but opted to run with 1GB textures to keep consistency with our other benchmarking data.

Investigating it further into more resolutions and there’s not much to raise an eyebrow, 4K average FPS results change very little, and the RX 5700 more comfortable rendering at 1440P.

A manual run of Shadow of the Tomb Raider using DX11, this time switching to Nvidia’s GeForce RTX 2080 Ti.

Same as above, but this time using DX12 for the API.

Finally, we combine the average FPS of DX12 and DX11 at both 1080P and 4K on the RTX 2080 Ti. No matter what we did, Shadow of the Tomb Raider refused to launch with less 2 cores and Hyper-Threading disabled when running in DirectX 12 mode, though it would run with DX11… albeit very poor minimum FPS (as you can see a few graphs above this one). There’s not much to say here – DX12 on both AMD and Nvidia’s modern hardware and lower core counts still benefit to thanks to the lower CPU overhead.

CD Projekt Red’s Witcher 3, running at the highest possible settings with a manual run on a horse near the start of the game. As is always important in a manual run, we want a consistent and easily repeatable section. You might spot the 0.1 FPS for the minimum results with 1 Core 2 Threads, and it’s not a typo. When riding (actually when doing about anything at all) the game would freeze for periods of time and wasn’t very stable. While the 2 cores results aren’t ideal (and loading could take what seemed forever) it’s still pretty impressive that the game is even remotely playable in this state.

World War Z is chaotic, crazy and fun… but also not super demanding. The RX 5700 does a respectable job here even at 4K, and once again overclocking could easily get the card to maintain the critical 60FPS target. At this resolution, the API choice isn’t as important, as the GPU just can’t crank out more frames, but at lower resolutions Vulkan annihilates DX11.

Switching to our Nvidia GeForce RTX 2080 Ti, we can easily see World War Z isn’t very demanding, and even 2 CPU cores with Hyper Threading minces through the built in benchmark… though in just about every single case, Vulkan is the better API choice. With just 1 CPU core, the game simply wouldn’t load with Vulkan, so I guess that’s a win in performance for DX11 right?

Conclusions of Core Count Scaling and DirectX 12 vs DirectX 11 Vs Vulkan

There’s a lot of results to unpack here, and ultimately this isn’t easy to wrap up in just a few paragraphs, but we need to start somewhere and that might as well be Intel’s Hyper-Threading technology. While there is definitely a few wins for disabling HT on the I9-9900K (and you do still get the extra cache and better overclocking compared to the I7-9700K), the reality is the performance differences are often so marginal it’s not worth it. You need to remember that our test system doesn’t have anything running in the background such as Chrome, anti-virus or chat applications too, and frankly, the extra available threads might make the results tighter still.

Then again, a few games we didn’t test here can benefit from HT disabled (such as CSGO) and games that do a poor job of managing multiple threads can also be a win for HT off. But unless you’re playing competitively… I wouldn’t worry about it.

Shifting to the core count scaling – and we’re in a tricky time of the market right now. Intel’s Core I9-9900K has 8 cores and 16 threads, but in Q1 2020 Comet Lake is expected to launch with 10 cores and 20 threads, and AMD has already released the Ryzen 3000 series, with up to 16 CPU cores. Ultimately, the next-gen consoles will probably be the driving force pushing developers to better leverage your systems CPU. Both Microsoft and Sony have farmed out CPU manufacturing duties to AMD as they did with their Xbox One and PS4 consoles respectively.

This time though, AMD won’t be providing either company a lower power, designed for mobile devices CPU such as Jaguar. Instead, the processor will be derived from their Zen 2 architecture and will sport 8 CPU cores with SMT (so 16 threads) and running at a much beefier clock frequency. Going by what both companies have said, this should allow developers to push much larger and open worlds, with a lot more detail, more characters and better AI… and this will likely translate to greater demands on the PC.

I can’t benchmark games that haven’t been released yet, and we can’t predict how developers will leverage hardware two or three years from now. What I can say in the here and now is that most games are still perfect with only 4 cores and 8 threads, though minimum FPS can suffer in more demanding titles, and if you’re a game streamer or want to run other things in the background the situation is a bit different.

But, for budget gamers CPUs such as the I5-9600K and AMD’s Ryzen 3600 are pretty compelling. The latter works great in cheaper AM4 motherboards (such as a B450) and at only $200 comes with its own cooler too, making it a perfect option for those wanting a more wallet-friendly alternative to the screaming high-end CPUs of 2019. We went more into detail of a budget all AMD build featuring the Ryzen 7 3700X (8 CPU cores / 16 threads) and an RX 5700 too, if you are curious.

As for APIs – the adoption rate for DX12 and Vulkan is speeding up, and that’s only a good thing for PC gamers. Owners of SLI setups might be cursing the lack of support for your second graphics card, but with Ray Tracing becoming normal for the next-gen consoles, visually we should see some impressive things.

Both of these new-fangled APIs improve performance on games such as BF5 even running on hardware with fewer threads active… remember, DX12 and Vulkan aren’t just designed to ‘talk’ to more CPU cores at once, but also have a lower performance impact on the hardware with less overhead. Not all games have a clear advantage for DX12 say over DX11, but in general my default now behavior now is to just turn on DX12.

RTX 2080 Ti

Amazon UK
Amazon US
Amazon CA

Intel I9-9900K

Amazon CA
Amazon US
Amazon UK

MSI MPG z390

Amazon US

 

Amazon UK
Amazon CA

AMD Radeon RX 5700 Series

Amazon UK
Amazon US
Amazon CA

AMD Ryzen 3000 Series

Amazon US
Amazon UK
Amazon CA

MSI MEG X570 ACE –

Amazon CA 
Amazon US
Amazon UK 

DeepCool Matrexx 70 –

Amazon UK
Amazon US
Amazon CA