PS4 Architecture Naughty Dog SINFO Analysis & Technical Breakdown

ps4-architecture-analysis-sinfo-naughty-dog

Part 2
A new light has been shined on the Playstation 4’s architecture thanks to Naughty Dog during their lecture at SINFO. This lecture is pretty damn technical, and several viewers / readers requested that I do a technical breakdown of this. So here I sit, a pack of lockets, hot drinks and a stack of white papers and technical reference documents.

I feel that a little technical information on the Playstation 4 is required before we begin with a breakdown of Naughty Dog’s SINFO documentation. It’s a prerequisite to understand how this works, and while many of you may know the info, it’s good to go over a slight refresh, isn’t it?

Refresher on the PS4’s technical specs

The Playstation 4 console is of course the sequel to the PS3, and despite the two architectures being quite different, the PS4 is a great logical step. The main CPU is an X86-64 CISC (Complex Instruction Set Computing) AMD Jaguar CPU. The 64 part means (among other things) the AMD Jaguar can address greater than 4GB of RAM. The CISC vs RISC we’ve discussed before in a previous article (Why the PS4 & Xbox One Moved to X86 CISC) and I’d highly recommend you check it out. Aside from that, the PS4’s CPU is 8 cores (6 available to games developers), that’s four jaguar cores per each of the two modules. We’re still unsure of the PS4’s clock speed however.

AMD Bobcat APU specs vs the PS4's Jaguar APU specs

AMD Bobcat APU specs vs the PS4’s AMD Jaguar APU specs. Note these specs are per four core Jaguar module.

The PS4’s GPU is based on AMD’s GCN (Graphic Core Next) architecture, its compute structure, is very similar to Volcanic Island. At its most basic, the GPU runs at 800MHZ, with 1152 Stream Processors (that’s 64 per each of the 18 CU’s (Compute Units) . A little basic math then provides us with the performance of 1.84TFLOPS of compute power. To get that, we do the following calculation: 1152x800x2. To explain: You take the 1152 shaders, multiply by the clock speed (800MHZ) and then because each handles 2 instructions per clock you multiply by two. It’s compute structure is very close to the Volcanic Island GPU’s, found inside AMD’s current generation cards such as the R9 290X. It handles 8 compute queues with 8 ACE (Asynchronous Compute Engines). For more on the PS4’s GPU compute systems read here. For what it’s worth, this is advanced beyond the original GCN architecture, which featured 2 ACE and 2 Queues per ace (4 total) which was scene in the Radeon 7000 series along with the Xbox One.

Finally, we’ll discuss the Playstation 4’s GDDR5 memory, which many would argue is the main advantage it has over its competitor, Microsoft’s Xbox One. GDDR5 RAM is typically used on high end PC GPU’s. In the PS4’s case, the memory runs at 5500MHZ (effective, 1375MHZ actual) on a 256-bit memory bus. This provides us with the now well known 176GB/s memory bandwidth. To calculate this memory bandwidth you can do the simple maths: 5500MHZx256bit/8. You divide by 8 because of the bits into bytes of memory. There was debate GDDR5 memory has far longer latency times compared to DDR3 RAM, which is also slower and less expensive, but in this article we proved this isn’t the case.

Playstation 4 Memory Allocation

Memory has been a rather limiting factor in console games development for some time now, with the previous generation systems offering a meager 512MB total for games developers to work with. According to Naughty Dog’s Jason Gregory, the memory issues continue to be a problem in modern consoles (including the PS4). In his own words, PC memory appears virtually limited, while the Playstation 4 requires very limited indeed. In his own words, the PS4 has “about 5GB, which seems a lot, but it isn’t”. If you’re wondering how the PS4’s memory is only 5GB when there’s 8GB mentioned just a few sentences above, you’d be forgiven for forgetting Operating System overheads. The amount of RAM the PS4’s has been discussed previously, this latest piece of info just adds further clarification.

naughty-dog-ps4-memory-allocation

Jason Gregory pointed out several times during the presentation that memory defragmentation is the enemy to performance, and games require careful memory management. Relying on general purpose memory allocators isn’t ideal, and handling the allocation of memory yourself is the way to go for the sake of performance. Allocators, for example in C++ (which is a programming language used for many games, particularly for the PS4 / PS3) are part of the standard library. Standard libraries are a collection of pre-written functions and classes (wiki link). It’s better however to create your own carefully created memory allocation, and then from within that allocate your various resources.

memory-fragmentation-example-naughty-dog-playstation4

It’s important to remember that each and every object in a game requires a slice of the PS4’s memory. It might not sound important, but each and every object in the games world (or helping to make the title run) requires memory. From the footsteps of your character walking, to the foliage you’re walking through, to the weapon models and the physics which hold the world you’re playing in together. If we look at the above example, each object in memory is a different size because different assets won’t be all equal in size. For example, realistically the size of a bullet shell from your handgun isn’t going to be the same size as the huge monster that thinks adventurer face looks like a nice breakfast.

Because of this, memory fragmentation can mean that performance suffers because there isn’t a large enough clock of memory to store the data. In the example we see above, ‘Q’ is larger than any of the available spaces and thus isn’t able to slot in – which results in a stall because either the information can’t be placed in memory, or the result can’t be copied into memory. This is particularly true when we consider the size of game worlds now, and even more so when we start to factor in GPGPU computing.

profiling-tools-naughty-dog-ps3

Naughty Dog does take advantage of development kits, which in many cases come with far more RAM than the retail units. This is for bug testing and to help create the game without worrying about memory constraints. In other words, they can quickly mock something up to see if it works and then worry about optimization at a later date. By carefully naming and defining their RAM heaps they’re not risking running into a nasty surprise late into the games development.

naughty-dog-mapping-memory

This is a great example of how important memory remains in the next generation. Recently a few games developers, including the creators of the Havok Engine have stated artists will quickly run out of memory. Carefully allocating each and every object and ensuring that each item is optimized remains a crucial task for the limited scope of hardware available to consoles. Guerrilla Games mentioned previously back in their own Post Mortem of Killzone Shadow Fall technology they were using GPGPU to defragment memory.

PS4’s Multi Core Hardware

Games consoles have been multi-cored for some time now, and developers being required to think in parallel isn’t anything new. To get the most out of the PS2 developers were required to learn to push code across all available processors, and this was taken much further with the PS3. With the PS3 there was the main PPU (Power Processing Unit), 6 SPU’s (Synergistic Processing Unit) and of course the GPU available to games developers. The PS4, as we’ve mentioned above features eight cores organized into two clusters, with the GPU able to assist in General Purpose tasks (GPGPU). The Jaguar is fairly slow compared to modern high performance PC hardware, so taking advantage of all the cores, and making sure they are always loaded up with tasks is crucial. This is particularly true when you consider that two of the eight cores are reserved by system, leaving you with six cores, which translates into six worker threads.

ps3-job-system-naughty-dog

If you look above, you’ll see Naughty Dog’s own example from the PS3. In this case you’ll see that the PPU of the PS3’s Cell is allocating jobs to each of the SPU’s (which are labelled zero through five). This process is known as ‘kicking’. As a ‘job’ finishes another is assigned to the relevant SPU. Simply put, this is an efficient way to ensure that the hardware is always busy. Let’s assume for a moment SPU three wasn’t busy for half a second (it was waiting for instruction), a rather large amount of performance is effectively going to waste. It’s therefore critical to orchestrate your jobs efficiently.

Part Two – PS4 GPU, Buses, CPU & Compute

Comments are closed.