In the first part of our analysis of the Xbox Series X hot chips conference, we deep-dived into the CPU and the GPU of the console, as well as some of the functionality such as hardware-based ray tracing. But, there was a lot of ‘stuff’ Microsoft revealed of their next-generation console, and so in part two of our coverage, we’ll be looking into the remaining components of the Xbox Series X, including memory configuration, SSD and audio.
Since we’d focused so much on both the GPU and CPUs in part one of this coverage, I think it makes sense to jump into the Xbox Series X’s audio processing capabilities first here. While there’s so much discussion on frame rate and ray tracing and other graphical capabilities, Audio is likely going to have just as much of an impact for the next-gen, assuming that you have the equipment to take advantage of it.
If you’ve not already took the plunge and purchased a great pair of headphones or surround sound setup to go with your shiny new 4k TV ready for next-gen, now might be the time to do so. I’ve thrown a couple of amazon affiliate links at the end of the article if you want a few suggestions.
But getting back to the audio, the Xbox One was interesting because it featured SHAPE, a custom-designed audio block designed by Microsoft. SHAPE was capable of handling 512 voices for game audio, and also the ability to also run Kinect voice commands too. This was important because Kinect audio processing was quite demanding, and having it being offloaded to the CPU wasn’t a particularly viable strategy for Microsoft.
Microsoft’s strategy here offloaded more work (assuming the developer took advantage of SHAPE) from the CPU, which was one area the PS4 fell behind the Xbox One. Sony’s strategy relied a lot more on the CPU for processing the audio, thus limiting the number of voices possible and effects used too, because if you ate too much into the CPU budget you’d obviously be taking away precious resources from other components. Also too, the CPUs if the 8th generation machines (again, based on Jaguar as we’ve discussed in-depth in part 1 of this analysis) weren’t exactly designed for high performance, meaning each and every cycle was even more precious.
Sony and Microsoft for this generation though have outfitted their respective systems with a high performance custom build component designed for audio processing. The Playstation 5’s audio processing is done on the Tempest Engine, which is essentially a tweaked Compute Unit designed to handle audio. Cerny states in the road to PS5 event that the TE was capable of performance about on par with the full usage of the PS4 Jaguar processor, assuming it was only being used to process audio (that is all 8 cores being used for just audio). For reference, this means about 102GFLOPS of performance then for the PS4 (for Jaguar calculations its clock frequency multiply by the number of cores multiply by 8).
The slide from Microsoft here is very interesting, as Microsoft states that the SPFP HW math performance is greater than all of the Xbox One X CPU cores working together. In many ways, a lot of what Microsoft are claiming here seems quite close to what Sony are able to offer for the PS5, though technically if the FP math is accurate from Microsoft, the FP performance of their audio component is actually more than the PS5, because the CPU clocks of the Xbox One X is 2.3GHz for its jaguar CPU, 600mhz faster than that of the PS5. Microsoft haven’t made the same HRTF claims as Sony, but right now drawing direct comparisons is quite tricky.
Microsoft created 3 audio engines, CFPU2, MOVAD and finally Logan. The first is for audio convolution, FFT and finally reverb, MOVAD team green refers to as a ‘hyper real-time hardware audio decoder’ and is capable of 300X channel simultaneous decoding, and boasts a signal to noise ratio of over 100dB
Logan also comprised of 4x DSP cores, and capable of a plethora of tasks including audio FX, and XMA Decode. Again, Microsoft states here that it’s able to achieve 300x real-time channels of XMA hardware decoding.
Microsoft’s mention of 300x real-time channels, for example, might seem considerably less than the 5000 sound sources of Sony, but Cerny also makes a point to say that this number isn’t useful and that quality of those samples would suffer. Long story short, I would say that the audio components of both consoles will be a leap ahead of what we have now, but we lack a lot of the finer details necessary to draw deep parallels between the two solutions.
I’ll throw this in here too, but Microsoft also mentions the security and decryption ability on the SoC too. All of these features seem to be Microsoft designed hardware engines. It’s hard to say if this is the only stuff Microsoft engineers actually designed, or whether there’s more that’s not stated here. To our understanding, Microsoft let AMD take the lead largely with the design of the GPU and CPU of the Xbox Series X, other than asking for specific customizations to enable certain functionality (ie, the backwards compatibility of the Xbox Series X).
Now, let’s move over to RAM, which is, of course, is where the data for games as well as the OS is stored. For the Xbox Series X, Microsoft went with a split memory strategy, likely in an effort to reduce the cost of the machine, but still offers the GPU a substantial amount of memory bandwidth.
To this end, there’s 10GB of memory on a 320-bit bus, offering 560GB/s bandwidth, the other 6GB/s offers a pretty chunky 336GB/s bandwidth, which still is more memory bandwidth than the Xbox One X and it’s 326GB/s.
The 10GB is known as ‘optimum’ memory and given its higher bandwidth, its where you’d ideally store data which is destined for the GPU. The other 6GB is known as standard memory, and 2.5GB of this space is reserved for OS and system functionality, leaving 3.5GB for games to use. Ideally, data for things such as the CPU and Audio will reside here, and there’s more than enough bandwidth to drive those two components.
Naturally, just like with the Playstation 5, this total of 16GB doesn’t seem a huge upgrade compared to the 12GB of the Xbox One X. But, these next-generation consoles have a secret weapon, the SSD. I’ve got more extensively into how the SSDs are important in a separate video (why SSDs in a console change everything). I’ll be putting part two out later on when we finally have all the stuff confirmed for both consoles, but for now, I’ll link to it and explain briefly.
Interestingly, there were rumors both the PlayStation 5 and Xbox Series X would use HBM2 at one point or another, but of course, neither machine ended up doing this. But, it appears that for Microsoft at least, this was considered at one point. As AnAndTech reported in follow up questions
“We’re not religious about which DRAM tech to use. We needed the GPU to have a ton of bandwidth. Lots of channels allows for low latency requests to be serviced. HBM did have an MLC model thought about, but people voted with their feet and JEDEC decided not to go with it.”
Mechanical drives have a slow rate of pulling in data, as well as long latency to actually access this data because of the time it takes for the head of the HHD to find the data on the spinning disk (seek time). SSD’s allows GB/s of data to be pulled in to memory, with the Xbox Series X capable of 2.4GB/s raw speeds, but the actual number is closer to 4.8GB/s because, in reality, you’ll compress the data.
Microsoft has also confirmed though that the 4.8GB/s is a figure which is conservative, and real-world speeds of up to 6GB/s aren’t uncommon. As I also explained in an exclusive, Sony is also working on ways of improving their lossless compression for the future, and it’s a good possibility Xbox can do much the same.
As we know by now, the Xbox series X contains a total of 1TB of space, with 8 channels connecting the SSD to the rest of the system. According to the Hotchips information, the internal storage connects via 2x PCI-E gen 4 lanes to the rest of the system, with another 2 lanes being reserved for the user-upgradeable SSD (which of course is the external slot). There are 8 total PCIE lanes which are from the IO Hub, and of course, the other 4 are used for things such as connecting to say, USB ports and other communication too.
To this end, the Xbox Series X has a plethora of technology which aims to increase the speed data can be pulled into the system, and this means that the amount of RAM for all intents and purposes is way more because you’ve the ability quickly swap in data as needed, which has enormous benefits for say a texture.
Microsoft points out in their Velocity architecture that the price of RAM has fallen previously (in terms of quantity) about 30 percent on a year on year basis (or to put it another way, for if X amount of space was to cost 100 bucks, it would be 30 percent cheaper 1 year later), but the several years this isn’t the case, and recent reductions in pricing are very small. The memory in SSDs though is falling much faster, so this is where a high-speed pool of NAND memory (the SSD) comes in.
Not only does this improve load times (and thus enables things such as Quick Resume too), it also means Microsoft can use technology such as Sampler Feedback (we’ll get to that in a moment). Interestingly, Microsoft had apparently planned such a transition back in 2007, which means it was just a few years after the Xbox 360 launched. So some 7 years prior to the launch of the Xbox One, an SSD like tech was hoped for in a future Xbox console.
There’s also a custom hardware block to enable decompression from SSD data, which Microsoft believes will on average be on a 2:1 ratio (again, this means the 2.4GB/s SSD hits about 4.8, though it can be up to about 6GB/s in theory).
This hardware block supports LZ decompression (which is an industry-standard) along with a Microsoft proprietary tech which is designed around BCPack. If this was being handled by just the Zen 2 cores of the Xbox Series X, it would eat up over 4 of them (essentially spilling into a 5th core). This again highlights why this is so important, and why technology on PC such as Nvidia’s RTX I/O (which offers a similar decompression ratio of 2:1) is going to be so important given the speeds of PC SSDs.
So, getting back to the Sampler feedback system then. Texture Maps are stored at different MIP (detail) levels, and of course which level of detail is shown depends on a variety of things, including the distance that you are to an object. But, sometimes not all of a texture is visible (something is blocking it from view, the angle you’re viewing it and so on).
With Sampler Feedback Streaming, the game engine can better understand what portions of a texture are actually being shown (ie, what’s visible) to the rendered image, and this information can be thrown on the screen. Lower quality textures can be replaced as necessary, and the memory savings are huge – no longer do you have massive chunks of data dedicated to an entire 4k texture, when only a small portion of this texture is shown off. Instead, textures can be split into tiles and then portions of that tile can be streamed in as needed.
Sampler Feedback ultimately allows the only relevant portions (and textures) to be pulled into main system memory, which Microsoft states helps improve ‘effective’ memory capacity by 2.5x versus just loading in full textures.
Variable Rate Shading is also discussed by Microsoft, and again I have a video detailing VRS much more extensively than I am going to go into here, so it’s linked as usual. But the idea of VRS is that not all elements of a scene are as important as the other, with objects in your peripheral vision being something you perceive with less detail versus something that’s in your focus. To this end, the goals of Variable Rate Shading is to divert rendering resources away from areas of a scene which will be less impactful, to either increase the shading (think drawing of detail, basically) of areas of a scene which matters more, or to improve performance (ie, so that each frame rates less time to be drawn).
Variable Rate Shading is part of AMD’s RDNA 2 line of technology, though it isn’t new or unique to AMD. Nvidia has NAS (Nvidia Adaptive Shading) which is part of the RTX 20 line of cards, and Intel Xe can support it too. API wise, Nvidia were using their own custom extensions, but for the PC and Xbox it’s part of DX12 ultimate (directX12_2). The amount of space eaten up on the die itself is tiny as Microsoft mentions, and therefore it makes no sense for it to not be included in a modern GPU.
Interestingly, I had been told (as I outlined in my PS5 APU bring up) that RDNA 1 had missing features because of lack of time to get them working before the first gen of RDNA shipped, and it’s very likely that RDNA 1 had originally been intended to launch with VRS (though I am not 100 percent of this). The issues though with RDNA 2 features not working did cause Sony to have to test the PS5 GPU with missing features early on too. While this doesn’t impact the consoles, it is a shame for those who invested in RDNA 1 class RX 5000 graphics cards.
While covered less with the Xbox event, there’s also Mesh Shaders. I’ve again detailed these multiple times before, so I’ll cover them only briefly here. They were first really previewed in the mainstream with Turing, though now again are part of RDNA 2. This is has a profound ability to change how the geometry pipeline of GPUs function, and essentially is a rather different approach to the graphics pipeline.
Instead of the traditional graphics pipeline, Mesh Shaders run more as a compute shader which provides a huge amount of control over culling and detail. In the old way of thinking, geometry and the like was processed as a whole, with the vertices (corners) of triangles requiring to first be processed, before you can cull anything.
This is a very complicate topic, but the short of it is, that not only is this process of rendering slower but also means that it means that nuking back facing parts of geometry (ie, parts of the geometry which won’t be seen) needs to be done later in the rendering process, eating up yet more time.
Mesh shaders aren’t a tweak or a ‘fix’ to this problem, and instead literally use an entirely new methodology. As mentioned a moment ago, Mesh Shaders for the Xbox Series X (and other supported hardware) basically run this as a compute shader, work being dispatched as thread groups (which are executed on the compute units of the GPU). Not only is this more precise and with greater control, but furthermore you’ve got the benefit of being able to cull much earlier in the pipeline and draw objects with a great deal more precision.
Nvidia’s Mesh Shader demo, known as Asteroids is a great example of this. Asteroids is, well, just that really, flying through a densely packed field of them, with faraway objects having just 20 triangles (and looking not much like an asteroid), to those closest to the camera having over a whopping 5 million triangles.
For those who will immediately ask me what about the Playstation 5, the PS5 handles things a bit differently than the Xbox. I’ll simply say Geometry Engine works in conjunction with other elements on the console to perform very similarly (but slightly different) functionality. I’ll cover how they compare against one another in another video, when Sony reveals more info. I’ve covered all I’m allowed to say about the PS5 GE in other videos for now.
Again, touching on the GPU Evolution slide from Microsoft, the company plots a graph of how graphics technology has evolved from the debut of the Xbox One base console back in 2013. Seeing the leap from the Xbox One X to the Xbox Series X, it’s clear that a number of things improved massively.
We’ve discussed on the channel a number of times how TFLOPS between generations of GPU architecture isn’t a great measurement of performance (again, look at these Polaris RX 480 numbers versus the only RDNA 1 based RX 5700, don’t forget we’ve configured the systems here to be a match, so when the Polaris and RDNA 1 cards are running at the same speed, their relative TFLOPS are identical.
But still, there’s a lot of other things in play. The Xbox Series X cranks the number of triangles up from 4.4 to 7.3 GigaTriangles (and again, is way more efficient), and thanks to the Xbox Series X featuring double the number of ROPS over the Xbox One X (64 versus just 32) and higher clocks, GigaPixel performance sees a huge increase (just 35 G/Pixel versus 116). For those wondering, while the slide doesn’t confirm the number of ROPS, you can easily calculate fill rate or the number of ROPS because fill rate (G/Pixels) is a function of the number of ROPS multiplied by clock frequency. So, for the Xbox Series X, we know the GPU clock speed is 1825MHz, so we can multiply this frequency by 64 and we come to the 116G/Pixels Microsoft mention here.
Again, this is not counting the massive improvements to the rendering efficiencies of RDNA 2 against the Polaris based Xbox One X. Check the first part of my analysis if you need further information on how this all stacks up though.
Conclusions then – well, the Xbox Series X is quite a monster. The CPU and SSD of the next generation alone are like night versus day difference against the old generation. The ability to have bigger and more expansive worlds, tons more AI, better almost immediate access to any piece of data required allows developers a lot of creativity and freedom which honestly just wasn’t possible with the older generation.
The GPU in the console is what’s getting most of the attention, and while the 12.1TFLOPS is an easy thing to focus on, it’s really not the story of the consoles. The API and technology that’s being built around the Xbox Series X is what’s the real story here. Microsoft has two babies, the PC and Xbox, and the Xbox is by and large a carbon copy of the desktop feature set of RDNA 2.
The next generation of Nvidia cards has already been confirmed to handle things like data decompression directly on the GPU (offloading the work from the PCs CPU cores, just like Xbox). Given we’re in a world where PC SSD’s are hitting up to 7GB/s read speeds, this is imperative or a drastic amount of data can be eaten up just from decompressing the damn data.
It’s also worthy of note that a lot of experimental technology is being explored right now, especially uses for things such as machine learning. One of Microsoft’s studios has been experimenting with using DirectML to upsample low-resolution assets such as textures and them upsample these in real-time. Apparently, to a user, the experience is so good you couldn’t tell a 4k authored native resolution texture, versus a tiny lower resolution texture. Currently, it has limitations, because right now it’s still difficult to train the AI in this, but a great example of potential use for the nextgen.
Technology too such as ray tracing isn’t just for pretty shadows and more bounces from reflections. In theory, it’s possible to use this for advanced ai and audio too, as I discussed with Nvidia in an interview, and Sony to are experimenting with this (as outlined in a Sony 2020 technology pdf). I think it’s fair to say that Microsoft has very similar plans for the Xbox series x too.
The Xbox Series X has the technology to offer stellar visual experiences, and Microsoft have empowered developers with a fairly balanced piece of hardware. There’s still a lot of questions Microsoft hasn’t tackled with the console, unfortunately, and I suspect remaining NDAs from AMD are in place until the RX 6000 series is revealed to the public.
Amazon Affiliate links
Budget 7.1 headphones
HyperX Cloud Revolver S Dolby 7.1 Gaming Headset
High end 7.1 Headphones
SteelSeries Arctis Pro Wireless
4K Gaming Screens
Budget 4K Gaming TV
High-end 4K Gaming TV