There’s a good chance you’ve heard a lot about the insides of the Xbox One, but there’s much that’s still unclear regarding the CPU, eSRAM and its API. Microsoft have combined forces with AMD (who’re responsible for creating much of the chips residing inside the machine) to hold an event which explains much of the jiggery-pokery inside Microsoft’s flagship console.
The Xbox One, as you’re likely familiar with by now uses AMD’s Jaguar CPU (the same type which power Sony’s machine), these are arranged in two clusters of four cores (eight total), and are running at 1.75GHZ. They are Dual Issue (meaning that for each clock cycle the processor can move two instructions from one stage of the pipeline to another) and supports both Out-of-Order and Speculative Execution. The Jaguar also supports SSE4.2 and AVX too – just for good measure. For Cache, the jaguar has 4 MiB of L2 (that’s 2 MiB per cluster), and is 16-way associative, allowing each cache requests to be in flight simultaneously (per core).
In an interesting piece of info, Branch Prediction (which is basically a way for the CPU to ‘guess’ which way a branch of data will flow, IF and Else structures are a classical example of this) are said to not be a Crystal Ball. Branchless tricks are recommended here (used to great effect in the Xbox 360 era) to help ‘force’ the data to move in the right way. Interestingly enough, this ties up with what Naughty Dog have said previously regarding the PS4, that the OoO (Out of Order) nature of the CPU and Branch Predictor can actually make optimization harder. When you compare this to an In-Order CPU (such as the CPU’s found in Sony’s PS3 or Microsoft’s X360) they place a lot more emphasis on the compiler – running instructions ‘in sequence’. The problem here comes down a lot to optimization – the In Order CPU is harder to work with, and a greater scope of optimization, where as the OoO is the reverse. For more info check out the PS3 / Cell CPU post Mortem.
Going wide (parallel) with SSE and ensuring all of the CPU cores is said to be a “No brainer” – hardly a surprise given the relative low performance of each of the CPU cores. Multi-threaded optimizations have been a hot topic on next generation consoles, and here’s no different.
While much of the information on the Xbox One’s GPU at the start of the lectures is well known, we’ll cover it anyway for the sake of being complete – we’ll move into the ‘new stuff’ in a moment. The Xbox One’s GPU of course uses AMD’s GCN architecture, running 768 SPU’s (Shader Processing Units) at 853MHZ and two ACE handling two compute queues each. Additionally, the GPU handles 3 hardware Display Planes, each being Resolution and Frame Rate independent from the other (hence why the Xbox One’s tiled interface can exist). It all features hardware video encoding and Decoding using Exact SRGB.
The Xbox One’s 32 MiB of ESRAM of general purpose RAM runs at 102 GiB/s (which is said to be ‘sometimes faster in practice’ – and indeed can spike up to a theoretical 204 GB/s). Microsoft places great emphasis that there’s zero contention, meaning the CPU (for instance) doesn’t access the ESRAM. This means the Xbox One’s GPU is free to go to town and use the limited resource as it requires. ESRAM “Makes everything better” – a few common uses would be Render Targets, Compute Tasks, Textures and Geometry. The 8 GiB of DDR3 is indeed running at 68 GB/s, and is described as “Low Latency” however “Not enough bandwidth to touch all of memory a frame, RAM is super-fast cache”.
Microsoft also provided “the four stages of adoption” when it came to the ESRAM, which seems to indicate how developers should utilize the RAM. The first, would be the allocation of a ‘small number’ of render targets in the ESRAM (Generally is the Depth / Stecil pass, followed by color targets, then other stuff). Clearly a render target is a prime candidate for the ESRAM. This means that a scene is held in memory (usually near completion in the drawing process) but maybe requires say a Pixel Shader to run. The second memory which was used is given an ‘Alias’ so that it can be reused again and again.
These first two stages were understood fairly quickly by developers, and were adopted in the first generation of titles to be released on the system (the Xbox One’s Launch titles), Microsoft believe that second wave games are using their third and forth stages of adoption. The third stage is Partial Residency, where not all of the Render Target is held in the ESRAM. Objects which require slower memory can therefore be placed into the slower DRAM (in other words the 8 GiB of DDR3) while the remainder can be squeezed into ESRAM. The example given was the Sky being held in the DRAM (DDR3) while the rest was crammed into ESRAM.
The final is Asynchronously DMA resources in/out of the ESRAM, which could well leverage the power of the compute commands the the Xbox One’s move engines. This would ensure that there’s always a flow of the most important data inside the ESRAM, and would help to optimize its use. Due to the usage of this while you’re rendering, it’s important for the developer to map their memory and plan everything in advanced – meaning good knowledge of the hardware. Microsoft believe we’ll be seeing developers using this final technique in the ‘3rd wave’ of titles, but there are a few second wave games (likely those who’re first party) already using this technique.
Microsoft given great emphasis to Swizzling Textures (which can be done via copy on the Xbox One’s Move Engines). Swizzling Textures alters the byte ordering of the colors. In computer graphics most things are stored in an RGBA (that’s Red, Green, Blue and finally alpha channels) format. But you might prefer to use say BGRA, or ARGB. Swizzling the bytes puts the byte order of data into the order which the API/platform desires. In effect you’re swapping the pixels around in a format so that it will be stored in exactly the same way it would have been transferred using its original format. Why is that so important? The simple answer is transfer speed. Microsoft also point to you not using the CPU-Coherent bus for DRAM reading while using the DDR3 memory, likely for the purposes of avoiding saturation.
Xbox One’s Graphics API
Microsoft’s DirectX was designed six years ago (at the time of writing), in 2008. As we know, it’s not a low level API and provides a lot of hardware abstraction so game developers can easily program for a large variety of GPUs. Microsoft’s Xbox One can indeed run ‘vanilla’ DX11 PC code. This means that porting titles to the console from PC is fairly easy, but for optimization there are extensions available to provide low level access to the Xbox One’s specific hardware. In other words, this allows developers ‘code to the metal’ – a phrase which is becoming increasingly popular these days.
While Microsoft have confirmed DX12 will be heading to both the PC and the Xbox One, some of the DX12 features are available now for the X1. This means developers are free to us Draw Bundles, Deferred Contexts and disable hazard tracking. For more information on DX12, please head over to our analysis here.
While it’s unlikely low level optimization would be required for the simpler indie titles, clearly for a game pushing the Xbox One’s hardware close to its limits, optimization becomes much more important. The fact raw DX11.1 code does work on the Xbox One is likely to encourage a lot of indie development – and it also means developers who’re working on the PC version of titles will be able to get a quick and dirty version running on the Xbox One’s hardware very quickly. This means even Triple A studios could in theory use this technique to get the game running in a basic way and understand what needs tweaking on the hardware. It’s also great for ‘proof of concept’ – in other words, testing out basic engine functionality (such as physics), simple gameplay mechanics and the like.
As we’d expect, the AMD Jaguar CPU just isn’t nearly capable of saturating the 68GB/s of DDR3 memory bandwidth available by itself. There were things pointing to this all along – the fact the CPU isn’t as fast as a PC’s CPU (which makes do with typically far less bandwidth, and yet isn’t starved unless it’s using a built in GPU for say games). The next clue was from Sony’s Playstation 4, whose leaked documents showed that it had a 20GB/s (roughly) bus used for the CPU access.
Clearly there’s a huge performance issue if there’s DRAM contention (for instance, the GPU is trying to gobble up all the bandwidth of the DDR3). In their document, Microsoft use a play on old BASIC programming to get their point across. 10. Use ESRAM as Much as possible. 20. Leave DRAM for the CPU and DMA. 30. Goto 10.
Xbox One’s Three Operating Systems
One of the more surprising features of the X1 was the rather unique design of its OS – in that it’s running three at once. The first is the Hypervisor, which is a light weight OS effectively ‘controlling’ and running the other two. Microsoft label these as ERA and SRA. ERA (Exclusive Resource Allocation) can only have one active app at a time, and is a custom based OS. It’s job is to run the Xbox One’s games, so clearly gets the lions share of memory, GPU and CPU reserve available.
SRA (Shared Resource Allocation) is a Windows 8 core (those who’ve used both will likely recognize the tiled interface). Its job is to run the applications. Initially upon its announcement there was much confusion over Microsoft’s decision to go with a three OS approach. Their reasoning was fairly sound – that it meant the OS responsible for running applications could be updated, changed and adjusted as required throughout the consoles life without posing danger of breaking compatibility with older games.
The ERA can run in one of a few different states. Full Screen is the first, meaning all of the resources are available to games and this applies even when the application is ‘snapped’ The second is ‘constrained’ – while the RAM allocation doesn’t shrink, CPU and GPU resources are reduced slightly as there’s no user input with the game so a slight drop in performance won’t impact things. Finally, there’s suspect. This state means that game is effectively in a halted state on the CPU and GPU (it’s using zero resources), but its still resident in memory and using the same amount of RAM.