AMD Mantle Released – Performance & AMD Talks To Us About Mantle Optimizations

amd-mantle-interview-performance

AMD’s long awaited Mantle technology has been released with the Catalyst 14.1 BETA, and we can try it out with a new client update for BF4. While we’re certainly seeing improvements in Battlefield 4 there has been some confusion as to what the Mantle API actually does and does not do.  It was widely considered that the main purpose behind the Mantle API was to allow ease of access to the GPU’s functions, thus improving the performance by effectively ‘accessing the GPU’s metal’.

There have been reports that this isn’t the case, and Mantle in fact primarily boosts lower performance CPU’s, and doesn’t really help in scenarios where the GPU is the weakest link. In other words, if the CPU is fast enough, but the GPU is the slower part, performance gains would be marginal. I reached out to AMD for comments and Robert Hallock responded and also sent me over a “Mantle Primer” going over the design and purpose of Mantle as it currently stands.

amd-mantle-bf4

Robert Hallock: We’ve made it abundantly clear that Mantle can represent a significant uplift in any scenario, but the benefit will be more pronounced in CPU-bound scenarios than GPU-bound scenarios. Even with that in mind, let me show you several GPU-bound scenarios that show a huge uplift as a direct result of Mantle:

StarSwarm 1080p @ Medium settings with an i7-4960X and Radeon R7 260X:
RTS test DirectX: 13.95 FPS (Unplayable!)
RTS test Mantle: 31.69 FPS (126% faster)
Attract test DirectX: 30.2 FPS
Attract test Mantle: 40.17 (32% faster)
Shmup test DirectX: 25 FPS (Unplayable)
Shmup test Mantle: 44 FPS (72% faster)

Right here are three examples where Mantle is making a massive performance improvement on a GPU-bound system, and two of them turned unplayable framerates into playable framerates. Everyone needs to have a nuanced understanding of the fact that the engine, the type of game, developer optimization, and system config all have a significant impact on the overall performance”

bf4-mantle-options

More Mantle Benchmarks

The above numbers Robert provided are certainly impressive, featuring a super high end Intel CPU, but certainly don’t tell the whole story. AMD have released further tests and benchmarks which show scaling from a wide variety of both Intel and AMD processors, along with different Radeon graphics cards.

Core i7-4960X CPU + R9 290X GPU
1080p, Ultra Preset, 4xAA: 9.2% improvement with Mantle
1600p, Ultra Preset, 4xAA: 10% improvement with Mantle

Core i7-4960X CPU + R7 260X GPU
1080p, Ultra Preset, 4xAA: 2.7% improvement
1600p, Ultra Preset, 4xAA: 1.4% improvement

A10-7700K CPU + R9 290X GPU
1080p, Ultra Preset, 4xAA: 40.9% improvement
1600p, Ultra Preset, 4xAA: 17.3% improvement

A10-7700K CPU + R7 260X GPU
1080p, Ultra Preset, 4xAA: 8.3% improvement
1600p, Low Preset: 16.8% improvement

testcase2_mantle-bf4

Mantle – For CPU or GPU?

AMD have told me “Mantle API (Application Programming Interface) ‘in its current iteration’ uniquely leverages the hardware in the GCN GPU’s. More broadly, Mantle is functionally similar to DirectX® and OpenGL, but Mantle is different in that it was purpose-built as a lower level API. By “lower level,” it’s meant that the language of Mantle more closely matches the way modern graphics architectures (like AMD’s own GCN) are designed to execute code. The primary benefit of a lower level API is a reduction in software bottlenecks, such as the time a GPU and CPU must spend translating/understanding/reorganizing code on-the-fly before it can be executed and presented to the user as graphics. Mantle comes in contrast to the “high level API,” which offers broader compatibility with multiple GPU architectures, but does so at the expense of lower performance and efficiency.”

AMD make it clear that Mantle is designed primarily to boost performance where the CPU is the limiting factor of performance (traditionally, this would be known as CPU bound by many into technology). API’s currently don’t scale well with multiple CPU cores for rendering. Thus for those with lower end to mid range processors, there’ll be a larger performance jump. AMD have done this with a variety of techniques, including:

Low-overhead validation and processing of API commands
Explicit command buffer control
Close to linear performance scaling from recording command buffers onto multiple CPU cores
Reduced runtime shader compilation overhead

AMD readily admit that in situations where the game title is ‘GPU bound’, due to the title pushing the GPU to the limit of its resources, a new API will be less effective. There are features AMD implemented in Mantle to improve scenarios which are GPU bound, but these are for the most part down to the games developer to use. AMD have provided several examples of features Mantle provides to help improve the performance of games if the title is indeed “GPU Bound”.

amd-mantle-dice-battlefield-frostbite-3-engine-performance

Reduction of command buffers submissions
Explicit control of resource compression, expands and synchronizations
Asynchronous DMA queue for data uploads independent from the graphics engine
Asynchronous compute queue for overlapping of compute and graphics workloads
Data formats optimizations via flexible buffer/image access
Advanced Anti-Aliasing features for MSAA/EQAA optimizations

AMD would be keen pointing out Mantle is still in Beta, and they’re not finished improving performance, and will be working on it over the coming months. Unfortunately, it’s not just resting in AMD’s hands, but also in the hands of the games developers. They must invest the time in learning the technology and working to develop better tools to help their titles run on the hardware. Currently in AMD’s own words developers are “still familiarizing themselves with Mantle and its relationship to Graphics Core Next” With multi platform releases, it’s true that there’s a good possibility not all titles will be as optimized as they could be.

There are 20 titles currently in development for AMD’s Mantle API, but it does remain to be seen just how many games developers are willing to embrace it. For those with AMD cards, it’s pretty much free performance, and scales extremely well with Multi-GPU configurations (AMD’s own CrossFire). It nothing else, it serves to highlight Microsoft’s need to improve their own DX11 API and further reducing its CPU overhead.

battlefield4-testcase1_mantle

Battlefield 4 and Mantle

Currently, the Battlefield 4 patch has been released and is downloadable via Origin, the patch weighs in at a fairly hefty 1.23GB. You’ll need a GCN GPU, Windows 7 or 8, and the AMD Catalyst 14.1 Beta’s. If your game hasn’t auto updated, here’s what you need to do to install and enable the BF4 Mantle API.

Open Origin
It’ll ask you for access control, click yes.
Origin will verify your Battlefield 4 game files. Get a coffee while you wait for the download to finish (or two, if you’ve a slower internet connection),
Start Battlefield 4 ‘Single Player’ or ‘Multiplayer’
Go to Video settings and select Mantle instead of DirectX (Note: An application restart might be required).
Start those benchmarks!

Speaking of benchmarks, DICE have updated their own blog with performance numbers.

Below is a copy and paste of DICE’s own internal testing performance numbers:

Test case 1: Low-end single-player 2: 64-player multi-player 3: Multi-GPU single-player
CPU AMD A10-7850K (‘Kaveri’ APU), 4 cores @ 3.7 GHz AMD FX-8350, 8 cores @ 4 GHz Intel Core i7-3970x Extreme, 12 logical cores @ 3.5 GHz
GPU N/A AMD Radeon 7970 3 GB 2x AMD Radeon R9 290x 4 GB
Settings 720p Medium Ultra 1080p Ultra 1080p 4x MSAA
OS Windows 7 64-bit Windows 8 64-bit Windows 8 64-bit
Level Singapore Siege of Shanghai South China Sea
DX11 avg 26.6 ms/f (37.6 fps) 18.87 ms/f (52.9 fps) 13.24 ms/f (78.4 fps)
Mantle avg 23.3 ms/f (43 fps) 15.08 ms/f (66.3 fps) 8.38 ms/f (121.5 fps)
Improvement 14% faster 25.1% faster 58% faster

DICE also mention the improved benchmarking tools in Battlefield 4:

“To simplify measuring performance in the game we’ve added a new tool to the in-game console to record frame times for later analysis. Simply run “PerfOverlay.FrameFileLogEnable 1″ to start saving frame times and “PerfOverlay.FrameFileLogEnable 0″ to stop. The resulting .csv file will be located in Documents/Battlefield 4 which can be opened & graphed by Excel or other applications for viewing.Another in-game tool that is useful to use is “Render.DrawScreenInfo 1″ that will now show additional on-screen information about your CPU & GPU config, resolution and as well as if Mantle or DirectX 11 is used for rendering.

Comments are closed.