The Xbox One will be receiving a nice boost in performance thanks to additional tools available for developers to profile the best way to use the ESRAM memory. Regarding Microsoft’s new API, the Xbox One has been said to benefit largely from lower overhead, a new API for running the ESRAM and greater parallelism.
Brad Wardell (Stardocks CEO) has taken to Twitter to discuss how Microsoft will be changing development on their console; combining this information with the DX12 GDC presentation we can start making a few educated guesses to the Xbox One’s API future and thus its performance.
According to Wardell via a Tweet, “XBO resolution largely based on how well the game uses esram. Dx11 esram API is difficult, requires a lot of iteration”. If this statement is accurate, the Xbox One’s GPU isn’t the main culprit for the lower resolution (the whole resolution gate situation), but instead the ESRAM. If you’re unfamiliar with what “iteration” is, a common form of iteration is a loop, which repeats code to determine values for multiple variables or sometimes just a single variable. So for a really simple example, you could have three variables. A ,B, and finally C. Both A and B are assigned number 1, while C is the result of A+B. To In short, A+B=C would be 1+1 and the result is stored in C (the result being 2). This loop could continue to run until a set condition happens, say that it runs through the loop several times until the result of the math adds up to say 8 which is stored in C.
Using this information, Brad indicates that a lot of this isn’t required (possibly leveraging the fact DX12 isn’t so reliant on various loops of code, which we’ve discussed already in the early GDC 2015 analysis of DX12). This will free up more RAM, bandwidth and other ‘bits’ for other information (in theory, and with some speculation).
Brad Wardell also confirms what many have suspected about the PS4, that the GDDR5 RAM gives a rather large advantage in terms of achieving “good perf” (good performance) – simply because of the memory architecture. The PS4’s RAM provides 176GB/s (peak and not including OS overheads) total bandwidth to the entire system. The Xbox One uses slower DDR3 RAM (max speed of 68GB/s) and to make up for the slower RAM, Microsoft included ESRAM.
The ESRAM currently requires trial and error, developers are forced to think which assets it should place in the Xbox One’s slower DDR3 main RAM and which should be stored in the 32MB of ESRAM. The Xbox One currently supports multiple render targets (we discussed that in part 3 of our SDK analysis leak) from both the ESRAM and DDR3. We said then “70 percent of the image (roughly) is held in ESRAM, with the ESRAM hitting 11455K (11MB) vs the DRAM’s 4864K (just under 5MB).” Even in Microsoft’s own SDK documentation, they say “optimizing to reduce memory bandwidth usage [of the ESRAM] is ‘a key strategy for Xbox One’
Considering that once overheads for the Xbox One is taken into account, the GPU is left with about 42GB/s of bandwidth from the DRAM, Peak real world figures of the ESRAM are about 133GB/s (but that is best case scenario). Furthermore, copying data from the ESRAM to say DRAM is far slower than this, as the DRAM can only copy data as fast as it can operate. 68GB/s is the highest possible number that’s achievable, but in reality it’ll be far less, as the 68GB/s is assuming there’s literally no other code running, including no Operating System overhead (and that’s far from realistic).
PIX (Performance Investigator for Xbox) is a tool used by developers to find out what’s happening with their code running on a machine. For example, how much workload is the main render thread taking from a CPU core, or how much RAM is a bit of AI code consuming. It’s improved multiple times since the early days of the SDK (from what we’ve read in the leak, the early PIX… heck, the early SDK was pretty bad) but it still has weaknesses.
Microsoft realized this, and have finally started implemented tools for developers to simulate different ESRAM strategies for PIX and figure out the best way of running this. Brad Wardell says in a blog post “This is where DirectX 12 comes in. First, they’re redoing the APIs to deal with eSRAM based on developer feedback. Second, they have a wonderful tool called “Pix” that now has a feature (or will very soon) that lets developers try out different strategies in using eSRAM and see how it performs without having to rebuild the project (i.e. they can simulate it in the tool). This too is huge.”
He also added to this via his Twitter account “Microsoft has cool new tool for optimizing esram use called Pix. Demo I saw resulted in ~15% boost.” While fifteen percent might not sound astounding, you should remember that this is currently – and doesn’t take into account some of the other optimizations the Xbox One will receive.
“Dx12 for xbo delivers bundles, greater, less overhead, parallelism and a new esram API,” says Mr Wardell in yet another tweet. And this is further clarified in his blog post (linked above); pointing out that DirectX 12 removes the serialization requirements of DX11, and thus things will switch (in the longer term) to a fully asynchronous scheduler.
If you look at the Unity section of the GDC 2015 analysis, where they discussed multi-threading shadow maps, it gives you a general idea of what the problems were. Instructions were often serialized onto a single main thread (core 0), leaving the other CPU cores to do little. Parallel uses multiple elements to solve a problem simultaneously. So with the Shadow Map example, the main thread tells the other threads “hey, we need this done” (splitting the work between them), and then the data is ‘synced’ at the end when all threads have finished… in other words, once all of the threads have finished their work, the application can compile all the results and send that off to the GPU to draw.
“Right now the ONLY 3D engine I’m aware of that has a fully asynchronous scheduler is Nitrous (and hence, why Oxide/Stardock have been getting so much attention in recent months in the graphics scene). This means we fill the GPU from all CPU cores simultaneously,” says Brad. He does indicate that Unity, Unreal, Cryengine and Frostbite engines are all coming along with this stuff however, so it’s only a matter of time.
For the PC, Wardell points out that “Ashes of the Singularity” (a new title from StarDock, his studio) has received a 70 percent boost in performance on rendering the same scene. The big news here though is there’s still backwards compatibility with DX11 – what that means according to Wardell is they’re losing performance because of this. He’s pretty convinced that if they’d have designed the project from the ground up with only DX12 in mind, their performance would have been higher. While we’re unaware of the exact system spects, they’re running a R9 200 series GPU and an Intel I7-5960X (which handles up to 12 threads).
All of this is still rather early, and as I’d mentioned in yesterday’s GDC DX12 early analysis video, a lot of documentation is still to come out to the public. Once it is, we’ll have a much better understanding how the Xbox One will benefit… but it’ll likely take a year or two before developers engines really catch on to being multi threaded.
Covering old ground, it’s important to remember this isn’t going to give the Xbox One’s GPU specs a ‘boost’ the hardware still puts out 1.3TFLOPS of performance, thanks to the 768 shaders. It also still has fewer ROPS and TMU’s than the PS4. But what Microsoft can do is get more out of what they have in the system – and that’s the key point here.
Ironically, the PS4 will possibly benefit a little from all of this too – Wardell has confirmed the PS4’s API is already very low level. The API, (GNM) does also have a higher level cousin API, known as GNMX (which we’ve known for some time). We’ll delve more into the PS4’s API and how Vulkan will interact with it in the next few days, but for now it’s best to tackle one thing at a time. The PS4 will possibly benefit because multi platform developers (EA, Ubisoft and so on) will likely be more willing to invest in time to optimize engines which are multi thread orientated for rendering. We know that the Order 1886 uses multi core command buffer generation for example… so the impact of DX12 and other API changes will possibly be interesting to be felt on the PS4.
Still, it’s very positive news for both PC gaming and the Xbox One… which is good for everyone involved.