If you’ve been reading into the hardware of both the Xbox One and the Playstation 4, you’ll know the two consoles share a very similar CPU architecture. Both utilize eight AMD’s Jaguar CPU’s, with the only real difference between them being the clock speed they’re running the CPU’s in. Microsoft’s Xbox One has a roughly ten percent clock speed advantage (1.75GHZ), versus the Playstation 4’s 1.6GHZ. Both machines (until now) have allocated two CPU cores purely to system, meaning it’s unusable for games developers.
We’ve learned however, things have changed over the Christmas break, as hackers known as H4LT have leaked the Xbox One’s SDK, firmware and documentation. This leak has proven that Microsoft have changed things around, and have now provided up to 80 percent of the seventh CPU cores to games developers. It’s reasonable to assume this is the reason the Xbox One has enjoyed some performance advantages (in other words, higher frame rate) than the Playstation 4 in either certain games, or certain scenarios in games.
We’ve covered numerous titles recently, including Dragon Age Inquisition, Assassin’s Creed Unity and Grand Theft Auto 5 and while the Xbox One may have a disadvantage in say, resolution, in certain areas of the game (such as heavy crowds in AC: Unity, or busy traffic while driving quickly in GTA5) the frame rate of Microsoft’s console certainly holds up a little better. Until now, there was a lot of speculation as to the cause behind this – it’d been theorized by some in the tech industry (including ourselves) the additional resolution (and sometimes effects – see the foliage in GTA5 for instance) of the PS4’s versions was to blame. But, it would appear this isn’t necessarily the case – and indeed the additional CPU grunt is likely helping in certain scenarios.
You might remember back when we covered the Ubisoft GDC 2014 presentation, which handily showed Ubisoft’s code running on both platforms, and the disparity between the two pieces of hardware. During the 5ms (milliseconds) of CPU time, the Xbox One CPU proved to have slightly better performance than that of the Playstation 4’s by modeling cloth simulation. However, the PS4’s GPU is considerably more powerful – 1.84TFLOPS versus the Xbox One’s 1.32TFLOPS (this isn’t counting any GPU reserves). Thus, when the same simulation was ported over to the GPU (using compute), the PS4’s GPU performance bettered the X1. It’s tempting to point the finger at these SDK updates for the cause of the CPU performance differences, but if we look at the dates of the conference (August 2014) versus these SDK updates, they simply do not match up. Thus, it’s likely it doesn’t have anything to do with it.
To clarify – in most gaming situations (be it on either desktop or console) the CPU isn’t so responsible for higher frame rate or resolution as the GPU. In other words, GPU performance is often the limiting factor of performance. So when there’s a lot of AI to process, a lot of ‘draw calls’ and other dispatches to the GPU, or other scenarios, the Xbox One could indeed have a slight advantage because the CPU is in effect ‘faster’ than the PS4’s (well, to be more technically accurate, the leaked SDK and documentation shows that devs have more resources to work with on the Xbox One rather than the PS4).
Before we continue speculation, it’s important to look at the flip side; after-all, there’s a reason that CPU was initially ‘off limits’ for developers. The Xbox One’s Seven CPU core now being freed up for developers means certain voice commands go bye-bye. More specifically, the custom and game exclusive ones (not the generic X1 commands such as “Xbox record that”. Examples of those would include the custom commands found in the Xbox One’s Launch title, Ryse: Son of Rome. Developers now are left with a choice – more CPU performance and memory bandwidth, or the usage of custom voice commands. I think the direction many will choose to go is fairly clear.
This isn’t the end of it – you’ll have noticed that the key phrase “up to 80 percent”, and that is because certain voice commands eat up a lot of processing power. To put it another way, if you ask your Xbox One to record something (using a command), the Xbox One will indeed use about 50 percent of the Seventh CPU cores purely on… well, recording that. This means two things – CPU time for developers on the seventh core can fluctuate, and that it’s still an advantage for developers. Let’s be honest, you’re not really asking your Xbox One to ‘Record That’ too often, right?
The AMD Jaguar powering both consoles isn’t a RISC CPU, but instead a CISC (Complex Instruction Set), and while the eight cores might sound impressive,, the X86-64 processor isn’t a patch on a desktop CPU. Interestingly, despite numerous improvements on both OoO (Out of Order Execution) and Branch Prediction (which basically figures the most likely instruction for code to reduce CPU time and improve cache hit rate) it can also cause developers to work harder in optimization. This has been stated by numerous developers, including Sony’s Naughty Dog over at Sinfo. Naturally, the PS4 isn’t the Xbox One, but due to the architectural similarities between the two systems, the same issues are present over at camp Microsoft too.
The Xbox One and PS4 CPU’s (as we’ve stated) have two CPU core modules (four cores each), and each cluster has it’s own 2MB Level 2 cache, while each core has 32 kb lf level 1 cache all to itself. If a ‘Local’ Level 2 cache miss happens, the console will then probe the other cores Level 2 cache; but it must do this via the consoles North Bridge. In other words, there’s no fast connection between the two caches. In the below table, you can see the rather large difference the Xbox One has in cycles from a Local cache hit, to a remote cache hit.
|Xbox One Cache hit type
|Latency (lower is better)
|Remote L2 hit
|Approximately 100 cycles
|Remote L1 hit
|Approximately 120 cycles
|Local L1 hit
|Three cycles for 64-bit values
Five cycles for 128-bit values
|Local L2 hit
|Approximately 30 cycles
If you’re unsure what any of this means, imagine that data is being housed in module A’s L2 cache. A processor on Module B needs that Data. The CPU will ask its local module’s L2 cache “Hey, do you have this?” and if it answers yes, all is well. If not, it needs a time consuming trip through the North Bridge to Module A’s Level 2 Cache. If it has it, then it’ll take 100 Cycles. If the answer is “nope” then it has no choice but to go to the systems DDR3 RAM.
Another by product of freeing up the Kinect performance is that not only do the developers get the precious additional processing time on the additional CPU core, but they also gain memory bandwidth. Allocating bandwidth to a function that’s not being used (Kinect Title Speech Titles) doesn’t make a lick of sense, and thus an additional 1GB/s of memory bandwidth is now available for games developers. This RAM bandwidth can be utilized by either the CPU, or the GPU – meaning additional flexibility for developers if they feel they’re bandwidth starved.
The Xbox One’s memory bandwidth has certainly been a subject of much contention – for both Microsoft’s usage of DDR3 memory (which has a peak performance of 68GB/s) and the small amount of eSRAM available for the console. Like the Playstation 4, the Xbox One doesn’t have dedicated VRAM (Video RAM) and instead the frame buffer and other graphics assets (such as say, textures) are stored in the consoles DDR3 or eSRAM (or in the PS4’s case, GDDR5 RAM). The ESRAM on the Xbox One is embedded directly on the GPU, and while the difference in throughput is fairly modest, the main benefit is that it has far lower latency, doesn’t have contention for resources (in other words, the ESRAM isn’t accessed by say, the CPU – it’s only available to the GPU).
Despite the peak theoretical rate of the ESRAM is about 204GB/s, but isn’t sustainable in real life apart from ‘short bursts’. According to Microsoft, you should plan your rendering scenarios with the ESRAM running at a peak of 102GB/s, but in certain tasks (say color blending) you might gain about 10 – 30 percent higher performance. The Playstation 4 instead uses GDDR5. We know that it’s theoretical peak is 176GB/s, but developers have said that in practice, it’s more like 172GB/s. It does mean developers don’t have to ‘worry’ over ESRAM usage though.
Naturally, the extra ‘CPU allowance’ of these updates aren’t the only changes Microsoft have implemented in the SDK. Numerous improvements have been implemented in the monthly SDK updates. For instance, back in July, 2014, Microsoft improved the typical ‘DrawIndexed Calls’ performance on the CPU by up to 68%, and in the same month GPU performance went up by an average of 3.5%.
If you follow the timeline of changes from the initial version (which was released back in April, 2012 – almost three years ago now!) it’s hard to not spot the rather impressive performance improvements Microsoft have implemented on the Xbox One. Considering the driver, eSRAM and optimization on the console were a nightmare (stated by many developers on or around the consoles launch), this is only positive news.
For the future, there’s also DirectX 12 – and who knows how it will play a role in the performance of the Xbox One.
While we’re on the subject of optimization, PIX itself has undergone numerous improvements. PIX (an acronym for Performance Investigator for Xbox) is the primary tool available for Xbox Developers for both performance analysis and debugging for the Xbox One’s CPU and GPU. Currently, it’s very hard for developers to ‘figure out’ how the CPU performance on the Xbox One’s 7th core will impact gameplay, or indeed what will and won’t cause issues. For gamer’s, this might not sound so exciting as “Moar CPU nao” but in reality, unless developers know where their code isn’t optimized, they’ll never get peak performance out of the Xbox One hardware.
It’s certainly true that there’s a lot more data to analyze from the leak, and I’ve little doubt Microsoft are more concerned that the SDK leak could be used for… let’s call it “homebrew” applications to be created. You might recall that every Xbox One console can be turned into a development kit, but don’t get too excited. Despite the software and instructions on how to do this being included in the leak, it requires one thing that the public doesn’t have. Server side authentication. Without that, you can write the best code in the world, compile it and have it ready to run on the hardware – but you’re not able to run it. Think if it as having a fully built PC that doesn’t have an operating system. You don’t have a USB drive or a DVD drive, or indeed any other way to install software to the hardware.