There’s a pretty good chance if you’re a PC gamer or power user, you’ve more than a passing interesting in AMD’s upcoming Zen processor lineup. It is a radical redesign and new approach for AMD’s processor lineup; and while company are touting Zen as more than just Summit Ridge (the desktop platform), but for also for servers and low power uses, for the average consumer, the desktop parts are the most exciting.
AMD’s currently lineup of desktop processors aren’t ‘bad’ necessarily when it comes to multi-tasking, but the current CPU’s lineup does fall short when it comes to single threaded performance, which hasn’t done them any favors for game performance. But that wasn’t always the case – several of AMD’s architecture have given Intel a run for its money, for example the K6-3 processors, the original Athlons and who can forget the Athlon 64. But, that was in the past – what are team red bringing to the future?
The CEO and President of AMD, Dr Lisa Su, recently claimed that the Zen architecture is “most competitive product lineup in a decade,” (regarding AMD’s CPU line up), while Mark Papermaster, the companies Senior Vice President and Chief Technical Officer, described the new CPU as a “quantum leap in core execution capability… [delivering] dramatic gains in single-threaded performance.”
So what has changed with AMD’s Zen?
Looking at the basic Zen block diagram, one of the more obvious changes is the inclusion of Micro-op cache, whose role is to ‘cache’ instructions as they’re decoded. When a new instruction is grabbed and decoded, the Micro-Op caches it, and then the CPU double checks to see if the instruction is already being stored in the Micro-op cache, in essence reducing the workload of the CPU’s front end. Unfortunately, just like the boasted “branch prediction enhancements” they’re pretty vague on the actual ‘how’ this is achieved, at a low level.
The purpose of “branch prediction” is a pretty simple concept, the CPU basically ‘guesses’ what it believes is the correct way a programs ‘branch’ will go. If you’re still a little unsure, most programming languages have structures such as ‘if’ ‘then’ or ‘else’. What the CPU is doing is basically guessing which of those answers the application will select ahead of time, so it can improve the flow of instructions from memory to the CPU’s cores.
This all comes down to Zen’s decoders, which in simple terms converts the ‘instructions’ that the CPU has retrieved from the system’s RAM and coverts those into signals that the rest of the CPU can use to execute those instructions. All AMD have said on the issue is that Zen can decode four instructions per cycle, which in turns get fed into the operations queue. Working in tandem with the op-cache, you’re looking at 6 operations per cycle.
We’re still waiting on more information on the execution units (which actually processes the instruction). From what we can tell from the diagram, AMD have chosen to have separate scheduling units from with Integer and Floating Point workloads. In theory (which this will remain until we can get the CPU into testing), this means AMD’s approach will allow for a wider core and to have a more parallel processing structure (which makes a lot of sense, given SMT support).
Really, the efficiency of such a design comes down to the caches and buffers of the processor, and we’ve (once again) precious little information on that, apart from a few figures AMD have shared. Those being that each core will have an additional 1.75x (75 percent) increase when it comes to instruction scheduler window, allowing for more effective Order of Operations, and a 1.5X (50 percent) wider issue width.
If you’ve read all of that text, and still a little unsure what it all means – the take away is that the Zen cores have been redesigned with performance and multi-threading in mind, which should translate to considerably better performance.
The L1 (Level 1) cache of Zen has also seen quite the overhaul, and has doubled in both size and increased associativity (associativity is a bit complex to explain, but if you have a 2-way associative cache, it means that any set location in memory can be cached in two locations of that cache, for more check out this wiki link) since Bulldozer. Per Zen core, you have 64 KB 4-way for the L1 instruction, L1 data has 32KB 8-way, Level 2 has 512 KB 8-way, and finally 8MB (shared across FOUR Zen cores) of Level 3 cache which is 16-way.
There’s a little confusion if the Level 3 cache if this cache is unified or not across those four Zen cores – meaning if Core 0 can read data which Core 2 (for example) has written into Level 3 cache. The main bullet point AMD would probably like enthusiasts to focus on however is “up to 5x cache bandwidth to a core”.
Zen and 14nm FinFet – The Perfect Match?
But Paul, I hear you cry – while we love hearing you talk about the changes of the cores and cache (Awww, thanks – me), what about the 14nm FinFet prcoess; what does that bring to Zen’s table? Well, firstly, let’s establish that 14nm and FinFet are two separate entities. 14nm is the represents the size (smaller is better), where as FinFet (or, more correctly FinFET) stands for Fin Field Effect Transistor. The conducting channel is wrapped in very thin a silicone fin, and this ‘wrap’ helps provide better control over the channel and reduce leakage.
AMD have already shown a shift towards the power efficiency table with Carrizo (check our Athlon X4 845 review for more info), and this was further enforced on the engineers for Zen. It’s one of the driving forces behind the introduction of the Micro-Op cache we talked about earlier in this very article (though a performance boost for Zen is obviously another key benefit). In short, if the front end of Zen doesn’t have to do as much work, because the instruction is already resident in the Micro-Op cache, the CPU can reduce power consumption.
https://www.youtube.com/watch?v=SPZkdFgg-xo
It’s also said AMD will be implementing Clock Gating (rather aggressively) into Zen’s processor design too. This is a fairly well established power saving technique, which has been used on processors even back in the days of Intel’s Pentium 4. The CPU simply only activates the Clocks (hence the name) in a logic block when there’s work to be actually done.
It takes two threads to Simultaneous Multi Tango
When describing SMT, the ‘easy’ way for me to draw a point of comparison is Intel’s HyperThreading. The idea is that each physical core of Zen will be able to actually execute two threads, and to the Operating System (and just as importantly, applications), they won’t be able to tell the two cores apart.
Intel aren’t the only game in town when it comes to SMT, and if you’ve used an Xbox 360 (for instance) you’ve used a non-Intel system which supports multi-threading. The Xbox 360 features an IBM Xenon CPU, which has three Symmetrical cores, each capable of handling two threads.
There are certainly considerations which must be made when designing a CPU with SMT in mind, as quite simply each Zen core must have sufficient resources available so that the two threads aren’t eating each others cache, bandwidth or other such things. If so, the benefit of such a design becomes totally negated. In a perfect world, Zen’s SMT will take advantage of natural gaps in each cores utilization, so while waiting for data to be sent (for example) the execution units can instead work on something else which is available.
The additional performance these SMT threads bring to the table depends on a multitude of factors, including the CPU’s architecture, the application and OS and also memory and other components.
Wrapping up Part 1
We’re almost 1300 words (kudos to you if you’ve stuck through all of this) into the article, and you might be asking yourself what does this all mean? Well, right this moment we’re waiting on even more information for the Zen architecture to be released by the company – and hopefully it will be over the next few weeks (such as HotChips). But AMD did show that currently, a 8 core (16 thread) Zen is capable of matching Intel’s I7 6900K if both CPU’s are running at the same clock speed.
During this demonstration, Intel’s CPU was clocked to just 3GHz, as that is the current speed Zen’s engineering samples are running at – so it’s understandable the company wanted an apples to apples comparison the best it could. It does bring several questions to the forefront – are there any other changes or improvements to the overall Zen design to improve the 40 percent IPC, and what will the final clocks of Zen even be when the product is finally released in 2017?
One can also bring the logical argument that Zen is matching Intel as of mid 2016, but are Intel just waiting for AMD to release Zen, then counter with something a little faster, and even if they are – does it matter, if AMD are able to release Zen in higher core configurations for the same price. For example, if AMD release a 8 Core / 16 thread Zen, for the same price as Intel’s upcoming 7700K Kaby Lake, or a cheaper price than Broadwell E (and its successor), with only a marginal performance difference in per core performance, and a similar number of threads, AMD are sure to make a lot of power users and PC gamer’s very happy.
AMD’s Lisa Su has recently conducted interviews, during which she was keen to remind potential investors to the company that while the traditional desktop market is shrinking, gaming, high performance computing and other such tasks are growing, and those are precisely the market AMD are going to be targeting with the release of Zen.
Until then, it’s all a waiting game. But one thing is for certain, AMD have gotten our (customers and the presses) attention, and they are totally aware of that fact. Let’s see what the company can bring to the table.