At CES 2014, Project Logan was given its official name – Tegra K1 architecture. The K1 architecture is a complete name departure from the previous versions of Tegra. The reason behind this is according to Nvidia’s CEO Jen-Hsun Huang, it’s such a big leap in technology that simply calling it Tegra 5 wouldn’t do it justice.
There are two versions of the Tegra K1, each with its own specific CPU configurations. The first includes a CPU based on a ARM Quad-Core Cortex A15, The second configuration is a Dual-Core CPU, 64 bit known as Project Denver. This is a custom based solution from Nvidia itself and of course, in just a few moments we’ll discuss both of them in detail.
It’s clear that Nvidia are taking gaming on the move seriously. And is doing so by leveraging its performance with the Keplar based architecture. Some of you might recognize Nvidia’s ‘Keplar’ as the same line of processors which are inside the companies current flagship graphics solutions for Desktop, for instance, the Nvidia GTX 780 TI flagship. The GPU inside the Tegra K1 is for all intents and purposes, built on this same technology.
This means that Nvidia’s Tegra K1 can support all of the big industry standards, including OpenGL 4.4, DirectX 11, OpenGL ES 3.0 and of course Nvidia’s own technologies such as Cuda and hardware Physx. Fortunately, Nvidia’s own technology is very well suited for the mobile environment. The SMX (Streaming MultiProcessor) design, Requiring only 5W of power, the chip uses several technologies to reduce power consumption, which is naturally critical in a mobile part. A large Level 2 cache helps to increase the chance of a good ‘hit rate’ (Hit rate meaning the percent relevant data is stored in the cache. The larger the cache and the better the processor and complier are at placing relevant data inside, the better the hit rate and less the chip will have to farm to main ram) and thus reduces power consumption of it needing to access the systems main RAM, which is naturally off the chip. Color compression, Early Z Culling and Primitive Culling helps to preserve performance, bandwidth and reduce the chips workload.
A Single SMX Means 192 Cuda Cores
The Tegra K1 GPU from Nvidia features 192 CUDA cores, which is part of a single SMX (Streaming MultiProcessor). This figure might seem low, considering that Nvidia’s own GTX 780 TI features 2880 CUDA Cores, but for a mobile device, it’s very impressive. It’s probably more fair to compare this to something along the lines of the Geforce 740M (a mobile processor), which features 384 CUDA cores, thanks to two SMX. Performance is hardly going to set the world alight compared to a high end desktop GPU, but it will be impressive enough.
The K1 will feature Four ROPS, 8 Texture Units and has a 128K of Level 2 cache. In terms of performance, the GPU isn’t crippled by worse tessellation or geometry engines, FP64 is present with 1/24 the FP32.
Of course, while we’d all love to have a mobile device featuring half a dozen SMX’s, in reality there’s numerous issues preventing that, including Die size and power requirements. The Tegra’s 5W of power is certainly manageable for the small battery of mobile devices, but certainly not say 15 or 20w. In the full desktop unit, these SMXes don’t operate alone, they mostly constantly be in communication with each other, along with the ROPS and memory controllers. To this end, is perhaps the biggest change to the Tegra K1, much of this complexity has been removed.
The Quad-Core ARM Cortex A15
The first option available in terms of CPU is based on the ARM Cortex A15, featuring four CPU cores which run when there’s need for heavy performance (for example, you’re playing a game). There’s a fifth core that Nvidia calls a ‘Shadow core” or “companion core” and during times where there’s very little going on in the system, for example its just in standby mode, downloading an update, and so on, this core takes over. Four cores can be active at once, (the fifth isn’t used during high performance situations). They are activated one core at a time to deal with the demands, starting with one core and slowly scaling up.
This CPU, unlike the Project Denver CPU (spoken about below) is a 32 bit processor.
The main changes from this A15 and the previous one in the Tegra 4 is simple: better performance for less power. Running at a max of 2.3GHZ, the CPU should give enough performance to the GPU of the K1. RAM is taken care of with LPDDR3 64 bit. This feeds both the GPU and CPU.
The Project Denver Dual Core 64-bit
Chances are, if you’ve been following Nvidia / technology for awhile you’ve at least heard the codename “Project Denver”. It was teased by Nvidia way back in the mists of 2011, and despite a few words here or there, we’ve not really heard much about it until now. Interestingly, Nvidia claims that the two SoC’s are pin compatible. There’s been no confirmation on the memory used for this CPU, but there’s a good bet (due to it being pin compatible) that it’ll use the same LPDDR3 64.
There’s likely many of you who’re reading this and scoffing – Dual core, forget that give me that quad-core goodness! Well, there’s a reason that Nvidia are dubbing this a “dual super core”. Sure you get only Dual Cores, but these operate at a much higher IPC (Instructions Per Clock) compared to the Quad A15 above.
It has been confirmed that this Dual Super Core Denver will be 64 bit, 7-Way Superscalar, running at a rather nippy 2.5GHZ and sporting a whopping 128K Instruction cache, and an additional 64K data cache. Much of how the CPU runs right now is a mystery. The rumor is that internally, the X86 assembly is converted to its own format before it reaches the CPU core.
Tegra K1 – Faster than the XBox 360 & PS3?
Performance is always a ‘relative’ term. If we look at the pure GFLOP count of Tegra K1, then it certainly easily beats that of the PS3 / Xbox 360. Nvidia have reached this Shader GFLOPS by a fairly simple calculation. The machine has 192 CUDA cores, so we take 192 * 2 FLOPS we CUDA Core * by 950. 950MHZ is certainly a fairly lofty goal, but if we do this math, we then reach the ‘correct’ amount of GFLOPS.
So is it fair to say then the Tegra K1 will indeed be faster than the previous generation? The Tegra K1 has fewer ROPS, but this is somewhat offset by the design / higher clock speed. To get the Texture Gtexels, simply multiply the ROP count (4) by clock speed (950) and then multiply times two. The lower memory bandwidth will certainly hurt, but then you also get a larger pool of RAM.
What does all of this mean? Well, despite that Nvidia have been proud to point out DX11 and that games will easily be ported from a high spec desktop PC to the Tegra K1, it’s also somewhat overreaching. On the other hand, it’s likely that games from the previous generation (the Xbox 360 and PS3) could indeed be ported over to the Tegra K1. Don’t forget that Nvidia also showed off several demos, including Unreal Engine 4 and of course, Serious Sam 3.
In the end, it’s worth remembering that mobile devices typically have far smaller screens (in resolution terms) than a high end desktop display or laptop, and this will also somewhat help to offset the performance. Similarly, because of the smaller display size of the screen, lower resolution textures, less texture filtering and expensive post processing effects are less crucial and thus can help to keep the frame rate up. It’s certainly not going to be a case of 1080P 60FPS gaming, but it’ll also be a definite improvement from the current generation we’re at now.
References used:
Nvidia K1 Whitepaper
Nvidia Keplar Architecture Whitepaper