Sony Playstation 3 – Post-Mortem Part 1 – The Cell Processor

Sony’s Playstation 3 hit the shelves in November, 2006 and has gone on to sell over 75 million systems. It’s provided a platform for some of the best games of this generation, including titles such as Uncharted, God of War and of course The Last of Us. But, soon after the Playstation 3 was released Sony’s console hardware team started to tear the system apart again, to begin to look at what they’d done right, and what they’d done wrong with the design of the system.

The Playstation 3 had numerous successes, but there were also several problems which despite fantastic programming, have remained throughout the life of the system.

The PS3 uses the Cell Processor, and is combined with 256MB of RAM for the graphics, 256MB of system RAM and a Nvidia RSX “Reality Synthesizer”. Unlike the Xbox 360, the PS3 came with a hard drive as standard and a Blu-Ray drive. The Blu-Ray’s slower loading compared to traditional DVD discs prompted Sony to place the hard drive in. One thing Playstation 3 owners will be familiar with is mandatory installs. Metal Gear Solid 4, for example, would force several of these installs on you – at varying points at the game, where Snake would appear on screen, smoking while the game slowly installed the next part onto the local hard drive.

So the question is – where did the Playstation 3 go wrong, and what did it do right?

The Cell Processor:

Sony’s Cell processor was based in many ways from the Power and PowerPC architecture. It was a RISC (Reduced Instruction Set Computing) processor, and had several components developers had to become familiar with to get the most out of the machine.

The PPE (Power Processor Element) could be considered the main core of the processor, and was very similar in design to a traditional Power CPU. The core could handle 2 hardware threads, and ran at a rather nippy 3.2GHZ. The core came with 512KB of level 2 cache, to help provide the CPU with a constant stream of data and ensure that instructions were always on hand. The CPU was an In-Order processor, which presented several problems. Being “In-Order” means that the processor expects data to flow in a specific way. In a perfect world, the compiler (which is run to create the binaries which make the game after it has been developed) would always have things in the correct order, but in reality this is almost impossible to achieve. The end result is that one (or both) hardware threads can effectively be processing nothing while the CPU searches for the next link in the chain of data.

Just for reference, the PPE became the basis of the Xenon processor – which in the end found its way into Microsoft’s Xbox 360. Microsoft had requested the processor be somewhat modified, and rather than using it in a Cell like configuration, instead opted for a Tri-Core processor (meaning 6 hardware threads total). The Playstation 3’s PPE can handle about 25.6GFLOPS of computing performance.

The Synergistic Processing Element

The SPE’s (Synergistic Processing Element) are vector processors, and operating in a SIMD (Single Instruction Multi Data) fashion. Each Synergistic Processing Element has its own local storage, 256KB. They are told what to do by the PPE, which effectively delegates them work. However, this needed to be programmed in ahead of time by games developers. Many programmers have lamented over the use of the SPE’s. It’s well known that using the Cell’s SPE’s can vastly improve game performance, and they can be called upon to perform a wide variety of different tasks. Running at the PS3’s 3.2GHZ each of the SPE’s could put out about 25.6GFLOPS each. The PS3’s Cell Processor features 8 SPE’s, but 1 is disabled. This is to provide a fall back in case of one of the SPE’s not working after production, thus increasing the yields of the CPU.

On paper, the Cell Processor sounds far more powerful than any CPU of its time. But there were a number of issues which were created by the RISC architecture and the overall design of the Cell.

The RISC (Reduced Instruction Set Computing) CPU architecture is powerful on paper. It uses a smaller amount of highly optimized instructions, and the lack of space dedicated to the traditional X86 instructions allows a greater amount of CPU registers. From the perspective of games development, this affords more performance but at the cost of needing to program more instructions in by hand using the programming language of choice (typically – for games that’d be a version of C, such as C, C++ or C++11.).

Branch Predictors

The Cell’s SPE’s also lacked a “Branch Predictor”. Branch Predictors are crucial for if, then or else statements. They play an important role of ensuring that there’s high performance at all times by the CPU. These statements allow programs to jump to different instructions in the code. This means that despite the fact it’s very important for RISC CPU’s to have a great compiler, some instructions are very hard to ‘predict’ ahead of time.

For example (this is based on C, but would not actually run as I’m leaving out some commands / syntax) – this is just a simple example for the purposes of understanding).

Int x

cout << “type in a number less than five”;

cin >> x

cout << "less than five, yay!";


cout << "that's more than five!"

This code would (if it had the right syntax) tell the application to ask the user to type in a number, It stores this number in x. If the number is greater than five, it’ll go to the second option (else) or, if you followed the instruction it’ll go with the first instruction (less than five, yay!). Fortunately, the SPE’s have large enough stores to usually be able to hold the various parts of the statements, but it’s a complex way to do things, and not a very elegant solution.

In software terms, AI, physics and so many other parts of the language requires the use of these types of branching statements,

The SPE’s are incredibly powerful and can be used to farm off work to a variety of different tasks. In the Playstation 4, much of the work that the SPE’s do is now done using the ACE of the GCN. Or to put it another way, the compute instructions are now being handled by the SIMD architecture of the Graphics Card. The SPE’s in this respect can function very much like Stream Processors, were many of the processors can come together to help with the processing of a certain task.

The SPE’s could handle a variety of different instructions:

  • Based on VMX / AltiVec – some instructions added, some removed.
  • Includes some (all?) of the PS2’s Emotion Engine ISA.
  • Supports vector or scalar operations.
  • Includes loads, stores, branches and branch hints.
  • 8, 16, 32 and 64 bit integer operations.
  • Single and dual precision floating point.
  • Saturation arithmetic for FP (not integer).
  • Simplified rounding modes for single precision FP.
  • IEEE 754 support for double precision FP (not precise mode).
  • Logical operations.
  • Byte operations: Shuffle, Permute, Shift and Rotate (Shift / Rotate per Qword or slot).
  • 128 x 128 bit Registers.
  • Local Store DMA I/O (to / from any address in system).
  • Commands for mailbox access, interrupts etc

In the Playstation 3, getting the most out of these SPE’s became essential to high performance. Developers would eventually learn to farm off work such as graphics processing, physics, Anti Aliasing, Artificial Intelligence along with functions such as game audio.

Overall, the Playstation 3’s processor was expensive to produce, costing Sony hundreds of millions of US dollars just in the production plant alone. This raised the price of the machine at launch, and took Sony reducing the manufacturing process (to 65nm) to therefore reduce the price of the machine.

Comments are closed.