When the hacker group H4LT leaked the Xbox One’s SDK and its accompanying documentation, we gamer’s and journalists were given a fantastic insight of the hardware and software Microsoft’s Xbox One is comprised of. When the the Xbox One’s SDK leak first hit, gaming news headlines primarily focused their attentions on the revelation the seventh CPU core of the Xbox One (well, up to 80 percent of it anyway), was now usable by game developers. This change further extended the CPU performance lead the Xbox One has over Sony’s Playstation 4 (thanks to the Xbox One’s higher CPU clock speed). But in reality, there’s a lot more revealed inside the documentation than just that.
For example, if you’ve ever wondered Xbox One games are more likely to experience frame rate drops during an “Achievement Unlocked” popping up on screen, you’ll have your answer soon enough. It’s our mission, starting with this – a first in a series of articles, to take you through the various improvements and changes in the Xbox One’s architecture, SDK and development cycle; explaining the language and providing insights into Compute, ESRAM usage, APIs and just about everything else that makes Microsoft’s Next-Gen console tick.
If you’re intimately familiar with the Xbox One’s hardware specs, feel free to skip this paragraph – if you’re not, we’ll go over a basic crash course. The Xbox One uses a custom build 28nm APU, co-designed by AMD and Microsoft. In this APU package sits a plethora of different components, including an X86-64 CPU (the AMD jaguar, which is eight cores) and runs at 1.75GHZ. For all intents and purposes, at the clock speed Microsoft are running it, the CPU puts out 112GFLOPS of computing power (total performance, across all eight cores). If we count the performance available for developers however the number goes down to about 95GFLOPS of CPU performance for the Xbox One. The esRAM is also included on die, which is used for the Xbox One’s GPU, like a fast cache. Despite the theoretical peak of the ESRAM’s performance hitting 200GB/s, in the real world Microsoft’s suggests you assume you’ll be rendering a scene with 102GB/s available. But, your mileage may vary up to an additional 20 – 30 percent. An AMD based GPU, featuring 12 GCN cores (running at 853MHZ) provides the Xbox One’s graphics. Finally, off package, there’s the the 8GB of DDR3 2133MHZ RAM, giving up to 68GB/s of memory bandwidth.
Dual Graphics Drivers & Operating System
The Xbox One’s Operating System doesn’t run a single operating system, but rather 3 OSes operate simultaneously. The ‘Host OS’ is a hypervisor, and is relatively light weight and controls and runs the other two operating systems. Microsoft label these as ERA (Exclusive Resource Allocation) and SRA (Shared Resource Allocation). As one can see in the diagram, the Exclusive Partition is what eats up the bulk of the Xbox One’s Resources; and is the OS responsible for running games. The Xbox One’s Shared Partition (which runs on a Windows 8 core) meanwhile runs other functions, such as system services. These range from being able to message your friends, performing background updates and other functions which aren’t related to the game.
The ERA has multiple states, as we discussed back in June, 2014 with our Microsoft Developer Day analysis. “Full Screen is the first, meaning all of the resources are available to games and this applies even when the application is ‘snapped’ The second is ‘constrained’ – while the RAM allocation doesn’t shrink, CPU and GPU resources are reduced slightly as there’s no user input with the game so a slight drop in performance won’t impact things. Finally, there’s suspect. This state means that game is effectively in a halted state on the CPU and GPU (it’s using zero resources), but its still resident in memory and using the same amount of RAM.”
If you’ve played an Xbox One title, you’ll notice one area the game can experience slow down (Frame-Rate drops) is when you earn an Achievement. Microsoft specifically point the finger at this in their own documentation. “Cross-OS calls can take a significant amount of time, up to several milliseconds” says Microsoft. This means that you can’t (or at least, it’s not good practice) to call these functions multiple times per frame
Xbox One has dual Graphics drivers? You might ask yourself – and indeed the answer is yes. Remember, that the GPU (Graphics Processing Unit) inside the Xbox One is a Radeon Video Card using the GCN architecture. When the first development kit PC’s rolled out from their alpha states, back in April, 2012, Microsoft were using a “Generic DirectX 11 Manufacturer Supplied Driver”. But they’d also included a “Specific Graphics Driver developed by Microsoft”. A few months later, in July, 2012 Microsoft had gotten a lot of the Durango User-Mode Driver (UMD) functional. Now, this release of driver supports important features such as Tessellation. This driver, slowly evolved over the coming months, improving features and went hand in hand with PIX (Performance Investigator for Xbox) to help you get games running on the hardware as quickly as possible. It makes logical sense – particularly if you believe the murmurings from developers prior to the Xbox One’s launch. If you recall, there were a lot of complaints that the early Xbox One’s SDK was awkward and unfriendly. It would tie up those rumors and demonstrate Microsoft were trying to knock this on the head – but at the same time, the cost was a driver that wasn’t optimized for raw performance.
Things changed considerably on the date of July, 2013 however. Microsoft added a preview version of the Direct3d Monolithic Runtime. The primary difference between Monolithic Direct3d and regular Direct3d (that you’re running on say your WIndows 8 PC) is that its been optimized for a fix spec of hardware. Therefore, Microsoft could remove a layer of abstraction which in turn increases the performance of the Xbox One’s version of Direct3d. Useless functionality that wasn’t relevant to a closed system was removed, and the Direct3d runtime and UMD were starting to merge together.
Over a year later (and numerous rather large performance improvements to the Xbox One’s drivers) and it was obvious the Direct3d Monolithic Driver was the direction Microsoft were headed. As you read over the API’s, it’s of little surprise that in May, 2014 Microsoft officially lead with the line (under the “what’s new section”) – “Stock Direct3D support has been removed in the May 2014 XDK, in favor of Monolithic Direct3D”. Microsoft also announce afew other rather large changes, includes Asynchronous Compute is no longer in preview mode, which means developers are able to leverage the compute potential of the Xbox One’s GPU (more on this later).
Xbox One GPU reserves and Allocation
Reading through the SDK, just how the depreciation and eventual retirement of the User-Mode Driver (UMD) was evident, the changes in policies regarding Kinect is also also hiding in plain sight. As you’ll likely know, a fair chunk of the memory bandwidth (of the systems DDR3 2133 RAM), CPU performance (hence the recent revelation of access to the Xbox One’s seventh CPU core) and GPU were cordoned off for the purposes of running the Xbox One’s Kinect. If a title uses these Kinect functions, 91.5 percent of the Xbox One’s GPU reserve is available to running the game title, but should developers not use certain features almost up to 100 percent of GPU performance is available.
Game OS titles can access the full performance of the 1.31TFLOPS of GPU power if the developers meet a few conditions. The NUI, or Natural User Interface, which is what allows users to control apps using gestures rather than needing to use the Xbox One’s controller. Microsoft originally stated it was the ‘preferred user interface technology’ for the Xbox One; has a GPU reserve of about 4.5%. If a game doesn’t use the NUI, then it’s available for developers to allocate it to the game – but only during gameplay. Microsoft specifically prohibit its usage during menus and lobbies. The thought behind this is to still allow biometric to function. Another caveat: it doesn’t matter if we as a user disconnect the Kinnect. A games title must specifically request this reserve.
The remaining four percent of GPU reserves is allocated to the Extended System Reserve. This is used for system UI rendering and for message dialogs. In other words, everything that the system has to render (for example, a chat invite) asks for its GPU pound of flesh. Because of the uncertain nature, Microsoft assert that the title must be able to tolerate a variance of up to 3%. Furthermore, you as a user can have a profound affect on the reserve. In some instances, the ESR (Extended System Reserve) will force the reserve to 4 percent of the GPU reserve per frame. In other words, for certain actions either you take, or notifications and other prompts, the game will not have access to the 100 percent GPU reserve (even if normally does) and instead has to operate with only 96% of the GPU available to it while it’s drawing that particular frame of animation.
While it might not sound like a lot, it’s about 50GFLOPS of computing power that the game isn’t able to run. Because of this, Microsoft suggest that you alter how the game runs while this happens to fit into the lower performance. It suggests either downscaling the internal rendered image (it suggests an example of 1440×810 (a total of 1,166,400 pixels, compared to 1080P’s native resolution which is 2,073,600 pixels) and then upscaling the image, OR the other alternative would be to reduce effects quality (so, perhaps render things in the distance with a lower level of detail, or cut back on certain shadow details…). This will generally happen in situations where the title is in fill mode, and other menus or items are on the screen, so users can’t notice the drop in quality and thus difference in quality.
This corresponds to the multiple view states that the Xbox One has, which are Full View (where the application fills the Window at 1920×1080), Fill View, This is where an exclusive application is sharing screen space with an app (it’s snapped to the right), and then not visible (in other words, it’s in the background).
CPU & Memory Allocation Info
When a game is running in the “Full” state (simply put, meaning the game is active and the user is playing on it) , unsurprisingly the Xbox One devotes most of the systems performance to it. As mentioned earlier, the Xbox One uses eight AMD Jaguar CPU cores, running at 1.,75 GHZ. Six of these CPU’s are normally available to games (with 100 percent of their resources available to the game) and should the resource configuration for the game request it, 50 – 80 percent of the resources of the seventh CPU core can also be used. In addition, 5GB of the Xbox One’s 8GB is available to games developers, meaning in effect the Xbox One has 3GB of the RAM reserved for the Operating System. Further, there is no disk paging available for this RAM once it’s consumed. this means developers have to be on their toes when allocating memory resources.
|Available Resource||Full||Full Extended||Constrained|
|CPU Cores||6 CPU Cores||6 CPU cores + 50- 80 % of seventh.||4 CPU cores|
|% GPU Time||91.5 Percent GPU||Almost 100 Percent||45 Percent|
|Available System Memory||5 GB||5 GB||5 GB|
While we’ve heavily discussed Microsoft increasing CPU allocation by freeing up the Seventh CPU Core it’s vital to remember that the proportion of the CPU time available on the seventh core can vary drastically based on what’s happening with the Xbox One. Microsoft assures developers they can count on having at least 50 percent of the core available at all times, but when the system must process certain commands spoken by the user (say, “Xbox go to friends”, 50 percent of the seventh core is required to run that task. So in effect, in the worst case scenario, developers gain 50 percent of a single CPU core. If these commands are not running (which should be the majority of the time) this raises to 80% CPU time available for the game. In effect, this means the amount of CPU performance available on the seventh core can vary by 30 percent, and currently the title isn’t informed that previous CPU performance is about to be snatched away from it. This clearly isn’t an ideal situation and Microsoft admit as much, and point it that an updated SDK release will fix this issue and provide the game notification. Also, optimization isn’t easy at the moment, because performance counters aren’t providing details of what’s happening on the Xbox One’s seventh CPU core. This means developers can’t really profile the performance and optimize as well as they should – again, Microsoft are keen to stress this will also be fixed ASAP.
The other problem with this is that custom voice commands are no longer going to work. So, while your “Xbox record that” will not skip a beat, the specific voice commands developers could create for their games will no longer function. Obviously, this won’t make a difference in certain titles – but the voice commands that you’d use to issue support orders in say Ryse: Son of Rome are an example of the types of things that go bye-bye if developers opt to use the seventh CPU core. No such thing as a free lunch!
The other side effect for all of this is that no only do developers get the additional CPU core, but they’re also afforded an additional 1GB/s of precious memory bandwidth from the systems DDR3 RAM. Since there’s a lot of concern over the memory bandwidth of the Xbox One, this change will be just as important.
The Xbox One’s DDR3 2133MHZ RAM can theoretically push 68GB/s (that number is the sum of both read and write to the DRAM). In reality, Microsoft concede that this number isn’t achievable, and developers will hit about 80 – 85% of memory bandwidth; or 55 GB/s to 57.8 GB/s if you prefer. While we’re on the topic of memory bandwidth, back in August, 2014, Microsoft’s SDK update increased GPU DDR3 Bandwidth by 1.5 percent by “tuning system bandwidth consumers”.
|Xbox One Cache hit type||Latency (lower is better)|
|Remote L2 hit||Approximately 100 cycles|
|Remote L1 hit||Approximately 120 cycles|
|Local L1 hit||Three cycles for 64-bit values
Five cycles for 128-bit values
|Local L2 hit||Approximately 30 cycles|
Regarding the Processor Core Allocation and Cache information. As we’ve stated a few billion times in this document, the Xbox One contains 8 AMD jaguar CPU cores. They’re created using two processor modules, with each module housing four processor cores. Each module houses its own level 2 cache memory which the four processors must share, and they also share the bandwidth to main memory too. So, for example, core 0 and core 2 share the same cache (as they’re on the same module), and must use the same bus to access the DDR3 memory. But, CPU core 5 and CPU core 2 don’t share the same L2 cache, because they’re not on the same module.
If a CPU core wants to access the other modules level 2 cache (so for example, CPU core 1 which is housed in module A wishes to access the cache housed in Module B) it’ll be considerably slower. This logically means transferring data, and the same could be said for a level 1 Cache hit too (which is even slower). Thus, it’s better for developers to ensure they plan their CPU threads correctly. If a thread is going to interact with OS shared apps, Microsoft rather obviously suggests developers run the code on either core 4 or 5.
Microsoft also add that while the Level 2 cache has doubled (in comparison to the Xbox 360), the number of cores has also creased by a factor of 2.6. When you couple this with the Xbox One’s pointers consuming twice the memory, on a cache per core basis, the Xbox One actually has less Level 2 cache available than the Xbox 360.
Furthermore, DRAM (main system RAM, in other words the 8GB of DDR3 memory) contention is a bigger issue than it was on the Xbox 360. It’s much easier for the Xbox One’s CPU to adversely impact the GPU’s performance, or vice-versa. Thus, Microsoft say “Optimizing to reduce memory bandwidth is a key strategy for Xbox One”
The Xbox One has two buses available for the GPU to access main system memory. The first of these is GARLIC, which is designed to reduce latency. It’s four channel bus, running at a peak bandwidth of 68GB/s (limited by DRAM memory bandwidth). These four GARLIC channel connect directly to each of the four DRAM controllers which comprise the Xbox One. For your point of reference, the GARLIC bus on the Playstation 4 also is limited by its own DRAM, but because it uses the faster GDDR5 RAM, its limit is 176GB/s.
The Second bus is ONION, and it’s noncoherent. It’s a two channel bus, capable of 24 GB/s READ or 16 GB/s WRITE (Peak). This limitation is due to the North Bridge. The two ONION controllers connect to the memory controller. Thus the total coherent memory bandwidth through the North Bridge is limited to 30GB/s.