It’s fair to say that with recent news the Radeon R9 300 series will make it’s June 3rd debut at Computex, there’s a lot of excitement in the air, particularly around the R9 390X which will feature HBM (High Bandwidth Memory). AMD have recently released an official set of slides explaining why they’ve opted to use HBM for their next generation flagship GPU, and just how it works.
Graphic cards have come a long way over the years, but it isn’t just a case of a more powerful GPU, memory bandwidth to feed the processor is just as critical. GPU’s have advanced with a myriad of technologies, with GDDR5 the usual memory employed by the higher end parts, but GDDR5 is reaching the point where it’s not enough to power the highest performance parts, and also has a few other problems – particularly when used in smaller form factors.
AMD and Nvidia are close to hitting a ‘wall’ of how much memory bandwidth they can squeeze out of GDDR5 memory, with the supposed rebranding of the Hawaii GPU’s in the R9 300 series featuring a 512-bit bus and 1,500MHz clock, the bandwidth will be around 384GB/s.
With GPU’s growing more powerful with each release, memory bandwidth requirements raise with each generation – but GDDR5 is hitting the point where simply raising its clock speeds or using a wider bus is enough to keep up with those demands. According to AMD in the above slide, GDDR5 is no longer providing an efficient return in performance for the amount of power that it’s consuming.
GDDR5 chips also eat up a lot of room on the PCB – modern GPU’s have grown significantly in length because of the number of GDDR5 chips featured on the board. Wider memory bus requires more chips, and this also means a greater number of voltage regulators for the PCB too – further increasing size. High performance desktop gamer’s might not really mind this as much as those interested in smaller form factors, but a smaller PCB is always nice – particularly if you’re planning on running multiple graphics cards.
Lower TDP is also a bonus, if the GPU eats up less power, it’ll produce less heat and you’ll not need a portable nuclear reactor to power it.
Think of the Interposer as a set of ‘wires’ which links to the memory to the GPU. The GPU and the memory sit on this interposer and this close proximity allows greater memory bandwidth at considerably less power consumption.
In the below image, you can see how the GPU sits ‘next’ to the HBM DRAM (a so called 2.5D die). The memory chips are then stacked on top of each other, with an interconnect known as TSV (Through-Silicon Vias) connecting each chip to the next so the chips can communicate with each other.
AMD and SK Hynix worked together to create the specifications of High Bandwidth Memory, and I was told in a phone call with AMD that while they’d managed to keep most details of their next generation GPU’s underwraps, the approval processes of defining HBM was ultimately what partially let the cat out of the bag.
High Bandwidth Memory is a totally different beast than GDDR5. It requires considerably less voltage to get the job done than High Bandwidth Memory, but the biggest changes are doubtlessly the Bus Width. Per package, GDDR5 has a width of just 32-bit, which is why cards with say a 256 bit interface requires 8 modules, a 512 bit interface requires 16 modules; while on HBM you’re running at 1024-bit Bus Width per package.
The clock speed differences between the two technologies might get you scratching your head – but in reality, the two technologies operate in an extremely different way. GDDR5 relies primarily on its clock speed, while HBM’s ultra wide memory bus more than makes up for the difference. The end result is that the fastest GDDR5 memory provides just 28GB/s (per stack) compared to 100GB/s per stack of High Bandwidth Memory.
There’s been a lot of rumors that the PCB of the R9 390X will be considerably smaller than that of the R9 290X, and AMD certainly hints this is true in the below slide. The PCB for a “HBM-based ASIC) is said to be about 50 percent smaller – which is a huge saving in terms of size.
This will also be a large benefit for APU’s too, as there are plans afoot that AMD will combine its Zen CPU architecture with its next generation graphics technology, creating highly powerful APU solutions. In theory, this will provide AMD a lot of interesting opportunities with custom or semi-custom designs for a variety of different purposes – as size and power consumption have been a stumbling block in high performance mobile parts (including for Laptops) for some time.
While reducing the FinFet process (from say 28 nm down to 16) will help some (reducing power consumption around the 40 percent mark), ultimately smaller and more efficient memory are just as vital to building a powerful, mobile device – or ultra powerful desktop solutions.
According to these slides, we should experience 3x the performance per watt than GDDR5, a considerably smaller PCB and a brand new memory interface – and that’s just for now. We’ve already news that HBM2 is coming along rather nicely, and will easily outshine the bandwidth provided by HBM1. Indeed, Nvidia are boasting that Pascall will feature HBM2 and will operate at a memory bandwidth of around 1TB/s.
Consider that the R9 390X is rumored to feature 4096 shaders, running at similar clocks to the R9 290X (which has 2816 shaders) – that’s a significant number of additional shaders, but bandwidth just has’t really increased that much. While it’s certainly possible for AMD to use faster clocked RAM, it’s not an efficient way to do things, and even if they ran a 512-bit bus with GDDR5 memory clocked to the limit (7,000MHZ effective, 1,750MHZ actual) they would only a achieve around 448GB/s bandwidth, a far cry away from the 640GB/s of their HBM1 technology.