The Playstation 4 flexed its graphical prowess with the launch game: Killzone Shadow Fall; undoubtedly making for one of the most impressive graphical debuts of a console yet. As many know, launch titles with consoles are either rush jobs, or aren’t as graphically impressive as you’d expect for a next generation system because the developers are learning their way around the system. With this said, the Guerrilla Games have done an amazing job using the PS4’s hardware for the first time.
The purpose of this article is to demonstrate how despite years of knowledge of the intricate works of the PS3’s hardware, the team managed so readily to surpass their work with Shadow Fall. It also will give you some idea of how the PS4 software will evolve over time, as the team better learns the intricate nature of the hardware and development tools.
Several months ago, Guerrilla released two PDF Post-Mortem’s of the demo of Killzone Shadow Fall, in which they commented on their findings on the transition to the next gen platform. There was the Shadow Fall technology demo Post Mortem, and a lighting Post Mortem PDF (which I did complete analysis on, check the relevant links for much more info). These served as yard sticks and info not only for Guerrilla’s own team, but also for other developers working on Sony’s new (and at the time) unreleased machine.
The new accomplishments vs the old masterpiece
Before we compare KZ3 and Shadow Fall, we need to understand what challenges the team faced for KZ3 on the PS3 platform. Killzone 3 many would argue is the greatest technical accomplishment on Sony’s last-gen machine. Guerrilla set lofty goals with KZ3, wishing to fix issues which were present in its prequel (such as control latency) and yet make a graphically even more impressive title. This forced the team to eek every last bit of performance out of the system, and applied liberal usage of the PS3’s SPU’s (more on this later) and new rendering techniques to produce a finished product which surpasses its prequel (Killzone 2) in all fronts. Particularly in terms of stable frame rates and control latency, which where multi player was concerned, had been a fairly notorious issue.
Solving the PS3’s Memory Issues
Sony’s Playstation 3, much like the Xbox 360 was given only a total 512MB of memory (discounting caches. 256MB XDR RAM (wiki link) for main memory, and an additional 256MB GDDR3 Wiki Link for graphics). When you account for OS (Operating System) overheads (which was 52MB for main memory alone), this quickly leave game developers struggling to make the most of the remaining RAM resources. It was also somewhat memory bandwidth limited too. PS3 Cell XDR bandwidth: 25GB/s, meanwhile the PS3’s RSX (the Graphics Processor Unit) GDDR3 bandwidth was around 22.4GB/s (both of these are theoretical memory bandwidth, and likely lower in real life scenarios).
Once a level is loaded, it is split into various ‘zones’. How it works is fairly simple to understand – while you enter Zone A and start playing through it, in the background the game is loading the data for Zone B into memory. As you progress forward into the level and eventually enter Zone B, the system will dump (erase) the memory from Zone A, and then has the memory free to start loading the data for Zone C. This process continues throughout the level, helping to minimize the games loading. You’ll also see similar happen during cutscnes, with cutscenes serving both for story and for a loading screen.
For Killzone 3, Guerrilla made great use of a streaming system for the music and sound in the game, claiming that over 95 percent of the sound assets in KZ3 are stored on the systems hard disc and streamed off as and when needed. This includes ambient sounds, character dialogue and of course the music. The Blu-Ray simply isn’t fast enough to stream all of this data, and so the decision was made to store on the hard disk. The purpose is simple – it frees up memory resources (reducing memory budget of the audio assets), but still allows Guerrilla to not limit the range of sounds or compromise on lower bit rate quality.
The team also employed texture streaming. For their previous work on the PS3 (Killzone 2), Guerrilla hadn’t yet implemented such a system. Textures in games can quickly eat up huge amounts of memory (50 percent or more), and that’s less than ideal, particularly given the PS3’s already tight memory constraints. Guerrilla set about to solve this for Killzone 3 by using mipmap chains and only loading the highest quality mipmap chains when and if required.
KZ Shadow Fall & PS4 Memory… err no issues
With the Playstation 4, many of these issues weren’t present. Instead, the team were given 8GB of GDDR5 memory to play around with. We know that the PS4 does indeed have an OS overhead, but still leaves 6.5GB of RAM available to games developers (using the Playstation 4’s Flexible memory system).
On top of the main memory figures seen above, a further 3GB of RAM was used for the Shadow Fall demo video assets (to put that into perspective, that’s six times the entire amount of memory inside the Playstation 3). The team are given much more freedom due to the abundance of the Playstation 4’s GDDR5 memory. That’s not to say that they can waste RAM or not optimize, but with reasonably careful management it’s a lot less of an issue. As you can see from the lower image, memory usage in the game changed over time, as did their optimization of the games assets.
The team also commented on the PS4’s GDDR5 memory bandwidth, mentioning that post-processing can often be bound on bandwidth. “Performance scales linearly with texture format size. We switched from RGBA16F to smaller minifloat or integer formats”. They also added to this “GDDR5 bandwidth is awesome! If you map your memory properly, Use the smallest pixelformat for the job”
The Playstation 4’s memory bandwidth of 176GB/s, but with about 172GB/s usable according to a games developer “It means we don’t have to worry so much about stuff, the fact that the memory operates at around 172GB/s is amazing, so we can swap stuff in and our as fast as we can without it really causing us much grief…” – Stewart Gilray.
The PS4’s HUMA (Heterogeneous Unified Memory Architecture) provides ease of management. Developers are free to divide up the PS4’s resources how they see fit, and therefore are able to better implement their vision than the cramped and restrictive nature of the PS3. The other benefit of a unified memory address space is that it allows the GPU’s compute shaders to easily handle the processing a piece of data and then for the CPU to take back over. This eliminates the time costly copying operations, and also saves memory bandwidth and optimization time.
PS3’s SPU’s & PS4’s Asynchronous Compute
The PS3’s SPU’s (Synergistic Processing Unit) can be used for a variety of different graphical or parallel processing tasks. As a brief introduction, they too run at 3.2GHZ and there are 6 of them active on the Playstation 3’s Cell Processor (the CPU). Each one of the SPU’s puts out 25.6 GFLOPS of single precision computing performance. For more on the Cell Processors workings click here.
The SPUs on the PS3 Cell processor take care of a variety of different tasks in Killzone 3, including the post processing work (including the Morpological Anti-Aliasing). The SPU are also used for Object Culling. They render a simple level of geometry into a depth buffer, and using a bounding box to check against a scaled version which is being held within the depth buffer. This will determine if the object is visible, and this if the GPU should spend time in drawing it. For Killzone 2, the team had estimated they’d used about 60 percent of the PS3’s SPU time. With the release of Killzone 3, the Guerrilla team knew they needed to tap into these resources to improve on their previous work.
Guerrilla weren’t the only ones to heavily rely on the PS3’s SPU’s. The primary issue with the SPU’s was the optimization required and managing their memory. Naughty Dog had even posed the following question in one of their own white papers (regarding SPU Optimization) “Why on earth can’t the compiler do a better job?”.
Naughty Dog had used the Synergistic Processing Units even in the original Uncharted for numerous tasks, including lighting calculation. Each of the SPU’s featured their own 256KB of local memory which required careful management, but the bigger issue was accessing external memory, which required DMA or Direct Memory Access. Ideally the SPU’s operated best with contiguous data chunks. Much like the Playstation 4, the SPU’s could handle Asynchronous computing, where multiple SPU’s would work together to calculate a task. Where does this leave us then? Well, quite simply, the SPU’s weren’t the easiest hardware to work with, and just like the PS3’s cell processor were In-Order processors, thus requiring a lot of work from optimization and the compiler.
With the PS4’s, the parallel computing is now run on the GPU using the GCN architecture’s compute structure (using the Asynchronous Compute Engines or ACE if you prefer). The PS4’s ACE engines are a beefed up version that were found in AMD’s Radeon HD 7000 series (the Tahiti variant). Sony worked with AMD to up the queue structure to 64 compute queues to better schedule compute tasks. Indeed, AMD have commented previous that the increased ACE and queue architecture for the PS4 was something Sony came up with. AMD also used similar for their Volcanic Island GPU’s – for more on how the PS4’s GPU and Volcanic Islands are so similar, check out this article.
Volatile bit is proving extremely useful when dealing with compute commands, which reside in the same Level 2 cache as the GPU’s regular graphical work. These compute commands are effectively ‘tagged’ with a bit of data and when it’s no longer needed, that single piece of data can be selectively erased. This is much more efficient than dumping the entire level 2 GPU cache.
While it is true that the GCN architecture remains an In-Order processor, from AMD’s own WhitePaper on the GCN technology. “Only issue one instruction of each type can be issued at a time per SIMD, to avoid oversubscribing the execution pipelines. To preserve in-order execution, each instruction must also come from a different wavefront…” But functions within the a much better memory architecture, better instruction sets and is overall a much easier technology to understand.
During the creation of the Killzone Shadow Fall demo, much of the GPU compute had yet to be be implemented and was only being used for memory defragmentation (which is used for texture streaming). Later on, this was improved with Guerrilla using compute for the color correction and force fields. Due to time constraints, they didn’t shift much work to the compute engines of the GPU.
During an interview with the official Playstation magazine, Eric Boltjes commented that currently the upper limit of enemy AI is 24. “Yes. It’s about 24, but that’s just enemy AI. And there are other types of enemy AI, lots of destructibility, lots of dynamic objects. Those have much higher limits but the amount of AI is around 24″.
“[when you encounter more than 24 enemies you get] Framerate drops. They’re all autonomous, they create their own effects. They’re busy little bees” he says. You’re not just getting 24 Helghans on screen, you’re also getting all their additional effects and interactions, and it’s this that means, “as soon as you push it over  then we get performance issues”.
Guerrilla have admitted that with more time, they’d have liked to shift more onto compute. Compute is particularly good at handling AI, physics and other parallel tasks. With the enemy AI, Guerrilla were running it on the Playstation 4’s AMD Jaguar CPU. There were 6 threads available to the game as you can see from the image below from their own PDF.
While the PS4’s CPU isn’t as beefed as a traditional desktop CPU, it is efficient. Sony haven’t commented on the clock yet, but if it is running at the assumed 1.6GHZ clock speed, it puts out around 102GFLOPS of computing power with all 8 cores. As we’ve discussed, 6 of those are usable for games developers. It’s clear that optimization is still required to get the most out of the CPU, and that compute will be vital in the future for certain tasks.
For the PS4 and Killzone Shadow Fall, Guerrilla used a main orchestrator worker thread, which scheduled the work across the other 5 cores. This means that any of the other cores could run the code when they were free, and thus meant for more efficient running of the games code and allows considerably more code to be ran in jobs. Comparing the PS3’s to PS4, Guerrilla have said in their PDF that (percent of code running in jobs): 80 -> 90 percent rendering code, 10 -> 80 percent Game Logic and finally 20 -> 80 percent AI. The team focused first on high level optimization and gained back 50 percent of the PS4’s CPU back by just fixing high level code.
Because the PS4’s CPU is an OoO (Out of order) processor, it relies less on the compiler to tell it what to do. Instead, it can take data ‘how it comes’ and that is great for optimization. The orchestrator thread tells the other CPU cores what to do. And with this, you’re much more likely to achieve higher utilization of cores rather than having one sitting by idly because it doesn’t have anything to do. This way, the CPU can always be processing something, which means that in theory, you’ll be getting the best performance per frame.
Killzone 3 used a rendering target of 720P, locked to 30FPS, using the deferred lighting technique. Anti-Aliasing was taken care of using Morphological Anti-Aliasing. As mentioned earlier, this technique ran entirely on the Playstation 3’s SPU and is touted as an ‘intelligent edge blurring filter’. It was 30 percent cheaper than the very different technique used for Killzone 2 with its own AA post-processing. Instead, that title ran at 2560×720 and then Quincunx downsampled to the target resolution (1270×720). This means that the scene was effectively rendered twice in the horizontal resolution and then scaled down. This is extremely expensive on the render.
With Killzone Shadow Fall, Guerrilla went in a different direction. The initial demo footage shown off ran at 1080P, and had the target frame rate of 30FPS with the form of Anti-Aliasing known as FXAA. FXAA or Fast Approximate Anti-Aliasing is often looked down on in the gaming community, due to the blurring effects it can have, with textures particularly susceptible. FXAA was originally created by Nvidia (see their white paper here) as a ‘low cost’ Anti-Aliasing solution. In many titles, such as Crysis 3 on the PC performance is barely affected by the usage of this form of AA.
But, for the purposes of image quality Guerrilla opted to not use FXAA by itself in the final release of Killzone Shadow Fall, and added TMAA into the mix. This isn’t mentioned in the Post-Mortem PDF’s but it is on Michel v.d. Leeuw Twitter: “We use FXAA + TMAA now, will dig deeper in a future publication, but it does complement each other.” – Link here. He also said “Tried a few things, and multisampling didn’t make the cut (costly, not the biggest impact on quality). More on that later”
Their final solution TSSAA or temporal Super-Sampling Anti-Aliasing is a more refined and effective version of FXAA. It uses the previous frames drawn as a reference point and this saves the GPU from needing to drawing multiple samples. Clearly TSSAA is a ‘cheaper’ form of Anti-Aliasing, but those who are eagle eyed will still notice issues with Aliasing or Shimmering. While MSAA (multi-Sample Anti-Aliasing) is prettier to look at, it is extremely bandwidth hungry and likes quite a bit of GPU power too.
In the final game, frame rates are unlocked, despite early rumors that they were indeed locked at 30FPS. This means that game will display at frame rates in ideal scenarios from between 30 and 60FPS. The higher the frame rate, the less time the GPU has to draw each frame of animation. Using 30 FPS the PS4 has 33.3ms to draw each frame of animation, which is obviously halved if the frame rate is 60. Despite the PS4’s 1.84TFLOPS of computing power, running at the resolution of 1920×1080 (1080P) the GPU already has its work cut out for it. Just like all development teams, Guerrilla went with what they thought was the best decision with their hardware.
High Quality Video download here
Part Two – Coming Soon
Join me soon for part two, where we’ll discuss character models, sound, particle systems and frame rates in depth.
Guerrilla games publications – CreationOfKillzone3.pdf
Guerrilla Games Shadow Fall PDF
Official PS Magazine link for Eric Boltjes interview
AMD Radeon Graphic Core Next Architecture White Paper
Volatile Bit info from Mark Cerny’s interview with Gamasutra