Now the Xbox One has seen a release in 13 territories around the world. Now people have torn the thing open, and explored just what makes the black box tick. While we have a reasonable idea exactly how the system ticks, there are still a few mysteries (mostly with the GPU’s compute) which we don’t fully understand yet. At least according to Microsoft during various interviews with members of the press. But, in this article we’re going to discuss the X1’s overall hardware specs in a roundup.
Xbox One APU
The APU inside the Xbox One is known as model 001 DG3001FEG84HR’. Inside this APU includes the CPU, GPU and of course ESRAM and hardware move engines. The chip is 28nm, and at first glance seems similar to the Playstation 4’s. Both consoles use the AMD Jaguar CPU and use the Radeon GCN architecture for the graphics. It’s obvious however that much of the Xbox One’s APU die space was used on the consoles eSRAM however. As we discussed in this article its clear that the choice to add in the eSRAM on die (rather than as a daughter die) was costly for Microsoft in terms of space.
Supposedly the Xbox One’s APU remains fairly cool due to Microsoft’s vastly improved cooling system compared to the Xbox 360. The X360 as many will recall, had the rather infamous issues with RLOD (Red Light of Death) which was costly to Microsoft, not just financially but also with the good will of gamers.
The touted 47MB of storage on chip isn’t as impressive as it sounds. Much of this is taken up with the 32MB of RAM. Then you must account for the 4MB of level 2 CPU cache, the level 1 caches, GPU cache and more. So in reality, this figure is to be expected.
Xbox One CPU – ‘AMD Jaguar’
The CPU inside the Xbox One is the low power AMD Jaguar. Originally, Microsoft had revealed the CPU to run at 1.6GHZ, but near the last minute raised the clock speed by 150MHZ, making it 1.75GHZ. The Jaguar can technically top out at 2GHZ, but neither Sony or MS have chosen for their CPU to run at this speed, and have opted for a slower clock frequency to reduce power consumption, reduce heat out put and increase yields. It’s thought Sony are still using their Jaguar at 1.6GHZ, giving the Xbox One a slight advantage.
The Xbox One’s CPU is a x86-64 bit CPU (the 64 bit is important, allowing it to address more than 4GB RAM, along with numerous other advantages). As you would expect from a X86 based CPU, it is fairly easy to program. Unlike the PowerPC architecture the previous generation of consoles used, these CPU’s are CISC (Compliex Instruction Set Computing) and have extra ‘instructions’ built in. These instructions make things such as memory operations much easier to program, but the cost of this is less die space for registers and other ‘bits and bobs’ for CPU performance.
The CPU has 8 cores, each handling 1 hardware thread each. They’re arranged in a 4 x 2 arrangement (four AMD Jaguar cores per module, with two modules total). Each cluster of four has 2MB shared level 2 cache (meaning 4MB total). They are Out-of-Order execution CPU’s, and with advanced branch prediction. All of this means that they’re much better at ‘guessing’ which bits of code are going to come up next. And if something unexpected needs to be processed, there’ll be less wait time. This in turn means less reliance on efficient and correct compiling of the application. Although in a perfect world the application would be compiled perfectly, AI and other code which can branch off into dozens of if’s and elses can be extremely hard to predict and to load into the CPU’s cache or registers. Therefore more efficient OoO (Out of Order Execution) and Branch Prediction becomes essential for the Jaguar.
Two of these cores are reserved for the Xbox One’s system functions, leaving a total of 6 Jaguar cores available for games developers. It’s possible that in time, MS may decrease this reserve.
The Jaguar also comes equipped with 32KB of Instruction cache (sometimes referred to as I-Cache) and an additional 32KB of Data Cache (D-Cache). The AMD Jaguar of the Xbox One puts out around 112GFLOPS of computing power, which is around 10GFLOPS higher than that of the Playstation 4. The Jaguar is an evolution of the ‘bobcat’ architecture from AMD and features numerous improvements. One of which is about 15 percent higher performance at a given clock speed (IPC = Instructions Per Clock). It also supports SSE 4.1 and SSE 4.2, MOVBE, AVX and other instruction sets.
Xbox One GPU – ‘AMD Radeon GCN – Graphic Core Next Architecture’
The APU features 14 CU (Compute Units) – although only 12 are available due to yields. The ‘spare’ are there for fault tolerance. Each CU has a total of 64 shader processors. This means in the case of the Xbox One you have 12 * 64 meaning you have 768 total. The GPU is running at 853MHZ, which is slightly higher than Microsoft originally announced. MS had originally said the Xbox One’s graphics processor would run at 800MHZ. This slight increase has meant that the Xbox One’s GPU can put out around 1.32TFLOPS of compute power. For reference, the Playstation 4 manages 1.84TFLOPS, running at 800MHZ thanks to 1152 shaders (18 * 64).
The Xbox One features 16 ROPS – ROPS are responsible for the final rendering of a scene. The PS4 features 32 ROPS, and many high end GPU’s for PC feature at least 32 (usually 48 in modern high end cards). This can impact the Xbox One’s fill rate, but in theory the GPU should be able to process 1080P at 60FPS with only 16. But it may be costly on certain visual effects or Anti-Aliasing.
It sports 48 Texture Units (sometimes known as TU’s), vs the PS4’s 72 Texture Units.
Finally, the Xbox One features 2 ACE, each handling 8 queues (16 total). PS4 features 8 ACE, each handling 8 queues (64 total). ACE (or asynchronous Compute Engines) are what handle issuing GPU compute operations to the shaders. Quite simply put, they store a command (in queues) and then the ACE’s tell the GPU’s shaders how it should process it, and how many shaders can do it. Efficient use of these Asynchronous Compute engines is vital, since the commands must be ran while the GPU is not processing graphics to avoid impacting performance of graphics.
The GPU itself is based on AMD’s Radeon GCN (Graphic Core Next) structure, which is featured on many GPU’s – such as the Radeon 7970 series. Although the Xbox One version doesn’t feature as many TFLOPS as a high end desktop GPU (which are above 4TFLOPS) many of the technologies are similar. It uses a new instruction set, rather than VLIW (Very Long Instruction Word) which is used for GPGPU computing.
Xbox One eSRAM:
The X1’s ESRAM is a huge topic, and has several potential issues. I have covered many of them in this article. The Xbox One’s eSRAM is there to make up for the lack of memory bandwidth due to the console using DDR3. The DDR3’s 68GB/s memory bandwidth just isn’t enough to feed the GPU, and thus Microsoft included high speed memory to make up for the gap. The issue is that there’s only 32MB of it, and some games developers are reporting problems.
1080P take up lots of memory in the frame buffer (which uses ESRAM). Texture operations too use much memory bandwidth, but once an object has been textured it can then be pulled out of slower DDR3 memory if required. This solution isn’t elegant however – but developers are getting better at understanding the work flow. I’ve no doubt that Tiled Resources will be a huge performance boost for the Xbox One’s texturing – assuming enough game developers embrace it.
The ESRAM takes a lot of space on the Xbox One’s APU, and the other alternative was to use either GDDR5 (which Sony did with the PS4), or a second daughter die containing 128MB of RAM. This would have given game developers more wiggle room. Time will tell if the ESRAM is just causing issues at launch, or if it shall continue to be a thorn in the Xbox One’s side. The ESRAM of the Xbox One manages to output around 204GB/s of memory bandwidth – although often, in real world developers are stating it’s around 150 – 160GB/s is achievable and sustainable.
It’s worth noting that developers aren’t complaining at the memory bandwidth, and instead most of their attention seems focused on lack of the amount of ESRAM, rather than speed. It appears developers will need to program and develop routines to ensure that they’re making full use of the 32MB of space. Efficient use of space is likely (at least for now) to be far more important than use of the ESRAM’s memory bandwidth.