When it comes to Sony’s strategies for console optimization, ICE Team are the ones usually in the lime light. But in reality, there are other teams also responsible for producing the tools, strategies and optimizations to improve the systems performance over the long haul. ICE Team typically work on API related ‘stuff’, including GPU drivers, API improvements and basic system level optimizations. But, you can’t just make tweaks blindly – you need to know what’s eating up performance and where there is room to improve. To do this, developers use what are known as Profiling Tools, which in a nutshell monitor the processes (such as the number of threads, amount of RAM and percent of processing performance its using) on a given piece of hardware. For example, the CPU or the GPU of the Playstation 4.
SN Systems (known also as the Razor team) are responsible for crafting these profiling tools, creating the GPU and CPU performance analysis, debugging and profiling tools for the system. It’s important to be able to drill deeply into the performance graphs so that ICE Team (and other developers who’re responsible for the PS4’s improvements) can figure out better (more efficient) ways to do something. Cort Stratton (a member of Sony’s ICE Team) mentioned on Twitter .the responsibilities of SN Systems and also there’s a link to a brief interview from one of their members of staff, Tom Charlesworth, SN Systems chief technology officer.
He reveals that they first got involved with the Playstation 4 development way back in 2008, but clearly back then there wasn’t much to know. Indeed, this was the same date as the PS Vita. All they knew is they’d have to “reboot” their existing tool chains. “By 2009 we were starting to have much more contact and context from the architects of PS4. We’ve had direct contact and engagement with Mark Cerny through the development of our Vita tools, and the PS4 tool chain as well. Initially we were there in a consultative role, being canvassed on the types of performance profiling hardware we would like to see in the next-gen PlayStation platform. Come 2010, we started migrating our PS Vita tool chain to PS4. At that point we’d been working on our Vita tools for up to two years, and were then ready to retarget them for PS4.”
“Our Vita profiler is called Razor, and that’s a joint GPU/CPU profiler. That was something new for us. We’ve taken that same technology, and moved it over to PS4. One of the problems we were faced with on PS4 was that the hardware profiling embedded within the SoC wasn’t so attractive, compared with what was available for Vita. We had to be a bit more creative in terms of solving problems similar to those we experienced with Vita, where we had hardware assistance. We had to solve the problems in software, but we’re very pleased with the results. It’s going to really help CPU engineers by letting them tune and speed up their code.”
Of course, the PS4’s SoC (System on Chip) is a single die solution which is comprised of an AMD Jaguar CPU (8 cores) and an AMD GCN based GPU.
If one points their browser over to their official website you’ll be able to checkout their “technology” section, where you can see snippets of their toolset. Below we can see their Razor Tool showing off its PC (Program Counter) Sampling. This simply ‘counts’ the number of times a program or function executes. “In the screenshot below, the dark green bars show which function the PC was in when the sample was taken,
doWorkB() are examples in this case.”
Generally in programming it’s considered good practice to start off optimizing the “most common” stuff first. There’s little point in putting a priority in optimizing say, a function, which executes once every minute compared to one that executes once every second. Of course, exaggerated for effect – but you get the idea. “The statistics provided by PC sampling give a good overview of application performance, but sometimes it is preferable to get exact timings for each and every function. This is where Function-level profiling, and more specifically, function instrumentation, comes in.”
Speaking of function instruments, some hardware goes a step further. The PS VIta development kit was an example given by SN Systems. The Vita provides absolute timings for each function running on the hardware. This allows them to basically graph out what’s going on and when, and doesn’t ‘cost’ performance from the game while its doing so.
A challenge for consoles, aside from limited CPU and GPU resources, would be storage resources. Quite simply put – this means Memory. You can’t pop open your console and stick in an extra 8GB of RAM (well, technically some consoles can be modded – such as the original Xbox, but you’d better be familiar with the soldering iron!). So with that in mind, there’s a limit on how much ‘stuff’ you can squeeze into RAM. Textures, audio, render targets, everything takes memory. Therefore, it needs to be compressed and made to be as small and effective as possible.
Dead Stripping might sound like something which belongs in a seedy part of Dead Space’s Ishimura, but in reality it’s simply a term to say “removes any dead / unreferenced code”. The basic premise of an application is that generally applications are built in functions and code blocks. Certain functions will be ‘called’ (asked to run in its simplest term) at certain points. If they’re never asked to execute, they remain silent and can’t do anything.
“Dead-stripping may also allow programs to link successfully by removing any unused code which may refer to an undefined symbol, instead of resulting in a link error. Dead-stripping is not limited to only removing unused code from a file. The linker also removes any unused symbols and data from data blocks. Such symbols might include global variables, static variables, and string data.”
Then there’s De-Duplication – which is probably the simplest to understand. It looks for code that’s written out more than once, and where possible will nuke duplicates. Once again, this makes the code tidier, smaller and more efficient.
Looking at the above cable, it’s pretty easy to see the amount of space that’s saved with each step. The default release build (in other words, hasn’t been optimized yet), and then each of the steps along the way knock down the space . Middleware demos get the most mileage, dropping down over 17 percent, where as a PS4 release can be reduced by almost 10 percent. This doesn’t just save space in RAM, it also reduces the size of on the disc (cutting down loading times), adds additional memory bandwidth (less code to push through the bus) and also has the added benefit of being ‘cleaner’.
While shaving off a few MB here, or cutting down the execution time of a function by say 15 milliseconds there might not seem a big deal, it is. Reducing the size of an application can drastically speed things up, particularly if it allows a function to fit in a CPU cache (or more functions to fit in the cache). The Cache don’t forget is much much faster than the PS4’s GDDR5 RAM, allowing the CPU to access the data pretty much instantaneously, for more info check out our analysis of a Naughty Dog Discussion. Considering that a frame rate of 30 FPS means each frame must be rendered in an average of 33.33 ms, or in the case of 60 FPS, 16.67 ms, you can see how things make a major difference pretty quickly.
The PS4’s CPU is likely to be the lowest performing part of the system in the long run, but for more info on its performance checkout our exclusive interview with Allegorthimic