Warp3D Nova's Performance Boost Partially Hidden By Lazy Cow
Alain Thellier recently ported his Cow3D demo/test-program to Warp3D Nova, which allows Warp3D Nova and Warp3D to be compared. Results have been coming in thick and fast, along with speculations as to why the results are what they are. I was expecting Nova to deliver a larger boost, and it actually does; it's just partially hidden. So, let's see if we can make sense of it...
Here's a sampling of what people are typically getting:
Hardware | Warp3D | Warp3D Nova | Boost |
Sam460ex + Radeon HD 7750 | 130 fps | 352 fps | 2.7x |
A1-X1000 + Radeon R7 250X | 154 fps | 448 fps | 2.9x |
A1-X5000 + Radeon R9 280X | 354 fps | 775 fps | 2.2x |
So Nova gives the Cow3D scene a 2-3x performance boost. Not bad, although not as large as expected. Also noted is that high-end GPUs (e.g., Radeon R9 280X) are getting very similar results to mid-range GPUs (e.g., Radeon R7 250X) when using the same CPU.
So what's happening? Why does the GPU power not seem to matter? And why is the X5000 doing so much better? To make sense of this you need some understanding of what affects GPU performance.
GPU Performance and Bottlenecks
Modern GPUs are complex beasts containing with multiple systems. Render commands and data pass through multiple steps before they finally become pixels on the screen. Each step has a maximum through-put, and if that limit is reached then that step becomes the bottleneck that limits performance.
Here are some of the bottlenecks that can be encountered:
- CPU bottlenecks:
- CPU power - becomes a bottleneck when the CPU performs lots of calculations to generate the next frame. This includes updating the poses of all objects, performing memory/object management and generating the GPU's command stream
- PCIe bandwidth - commands, vertex/texture/shader-constant updates all need to be sent to the GPU. This becomes a bottleneck when the GPU processes incoming commands & data faster than the CPU can send it. On AmigaOS the this limit is currently lower than it could be (as at Jan. 2017) because the CPU writes the data to VRAM rather than getting the GPU to fetch it using DMA (would love to fix this, but it'll take time)
- Render-calls/s - there's an upper limit to how many render calls the driver can send the GPU, which is a combination of CPU power and PCIe bandwidth. Apart from generating and sending the commands, there's also memory/object management to be done (locking/unlocking/tracking buffers)
- GPU bottlenecks:
- Command-processor speed - there is a maximum limit on how many incoming commands/s the GPU can execute
- VRAM bandwidth - reading/writing vertices, textures and pixels costs bandwidth; when the limit is reached then the GPU can go no faster
- Vertex fetch rate - the vertex fetch unit can have its own upper limit independent of VRAM bandwidth (vertices/s)
- Texel fetch rate - the texture fetch units also have their own maximum rate (texels/s)
- Fill rate - the render output unit has a limit on how many pixels it can write (pixels/s)
- GPU power - there's a maximum number of instructions per second a GPU can perform (GFLOPS)
As you can see, there are a lot of potential bottlenecks. Which one(s) you hit depends on what you're rendering:
- Render lots of tiny objects with few vertices and you'll likely hit the render-calls/s limit
- Render models with huge numbers of vertices and eventually the GPU's vertices/s limit will become a bottleneck
- Render a few objects at very high resolution and the fill-rate may become the bottleneck
- Finally, use large complicated shaders and you may hit the GPU's GFLOPS limit
The last 3 bottlenecks depend on how fast or slow your GPU is. Low-end GPUs will hit those limits much earlier than high-end ones.
Which Bottleneck Limits Cow3D's Performance?
Short answer: PCIe bandwidth is the main bottleneck. Medium and high-end GPUs have similar performance because they're all rendering faster than the CPU can feed them. Warp3D Nova reduces the bandwidth by storing the Cow3D model's vertices are already in VRAM, but there's still commands and data to transfer. The command stream is bigger than you think (lot's of state to set up and maintain), and there's also data such as the cow's orientation that gets sent to the GPU every frame.
That's only part of the picture, though. I didn't realise it initially, but there's also something else consuming PCIe bandwidth: the fps counter and info-bar at the top of the window. Text rendering is still done in software on the CPU and, consequently, is relatively slow. It also consumes a fair bit of PCIe bandwidth. Cow3D's fps counter measures the overall fps including the time taken to render the info-bar. It's not measuring purely the Warp3D/Nova performance; results are skewed by the time taken to render the info-bar text.
PCIe bandwidth also explains why the A1-X5000s results are higher; it has a pretty good PCIe controller with higher throughput for CPU-based transfers.
The Actual Performance (Minus Info-Bar Skew)
Alain includes Cow3D's source-code, so I built a custom version that doesn't render the info-bar (but does output the fps to the console). Here's the result:
Hardware | Warp3D | Warp3D Nova | Boost |
A1-X1000 + Radeon HD 7770 | 218 fps | 1396 fps | 6.4x |
Boom! Just like that, the performance boost more than doubles (as does the base Warp3D fps). The info-bar was skewing results quite a bit.
I'd love to see what you get on your hardware, so you can download the modified Cow3D here: Cow3D6-NoInfoBar.lha. Please post your results in the comments below.
So, There's No Point in Getting A High-End Card?
I've seen a few people jump to this conclusion, and want to make a case for high-end cards. Sure, high-end cards won't make much difference for Cow3D, but that's just one test. If Alain doubled the number of vertices in the cow then lower-end cards would probably hit their vertices/s limit. At that point Cow3D's fps would be more GPU dependent. In fact, if you had a really low-end GPU (e.g., a Radeon R7 240) then you'd already notice lower fps than with better cards. Now, let's say we double the number of vertices again... or used complex shaders... or increased the resolution... or... I hope you get my point.
The bottom line is: Cow3D and other benchmarks only give you the performance under specific conditions. Getting 100 fps with one game engine won't tell you how well another game will perform. This is why benchmarks like 3DMark run a suite of tests, each probing different limits/conditions.
When more Warp3D-Nova/OpenGL-ES-2 software is released (and there are some in the works), it's highly likely that some will work better on higher-end graphics cards. You're welcome to buy the cheapest card you can find, but you may find that card limiting in future. Of course, I have no idea when that day will come...
14 Comments
Spectre660 12/08/2018 12:01am (6 years ago)
Nova: up to 1544 fps
(Second run gives up to 1605 fps)
Spectre660 12/04/2018 9:15am (7 years ago)
Cow without bar:
Nova: 1387 fps
Plain W3D: 385 fps
Spectre660 12/04/2018 9:12am (7 years ago)
Cow without bar:
Nova: 1445 fps
Plain W3D: 392 fps
Spectre660 11/04/2018 11:45pm (7 years ago)
Cow without bar:
Nova: 1854 fps
Plain W3D: 421 fps
Sinan Gürkan 29/01/2017 12:43am (8 years ago)
Cow3D6-NoInfoBar-Nova: 1439 fps
Cow3D6-NoInfoBar: 378 fps
GS 28/01/2017 5:39pm (8 years ago)
X1000, Radeon 7950-3GB DDR5, AmigaOS4.1 FE-Update 1
Warp 3D: 153 FPS
Warp3D-Nova: 450 FPS
new Cow3D-no info bar by Hans
Warp3D: 187 FPS
Warp3D-Nova: 1112 FPS
THANKS HANS !!!
Paolo Vanni 22/01/2017 6:29am (8 years ago)
Radeon R7 250X AmigaOS 4.1 FE Upd 1
coW3D6-os4 bar/no bar 170/234 fps
coW3D6-os4-Nova bar/no bar 470/1560 fps
tommysammy 21/01/2017 4:54am (8 years ago)
plain W3D: 183 fps
Nova: 1250 fps
On X1000 with RadeonHD 7850
Maverick 20/01/2017 11:30pm (8 years ago)
Sapphire R7 250 1GB DDR5
EngineClock 1000MHz, boosted 1050 MHz
ddni 20/01/2017 11:15pm (8 years ago)
R9 270x on X1000 4.1FEu1
Warp3D: bar / nobar = 152 / 206 fps Increase 1.35x
Nova: bar / nobar = 446 / 1558 fps Increase 3.5x
Cow3D6: build Jan 19 2017 at 12:22:36
CheckWarp3D:
============================================================
Current bitmap's format is W3D_FMT_A8R8G8B8 (2048)
============================================================
Hardware W3D_Driver available
============================================================
W3D_Driver <AMD Radeon HD Southern Islands> soft:0 ChipID:13 formats:18840
W3D_Driver <Permedia2> soft:0 ChipID:3 formats:31231
W3D_Driver <Voodoo Napalm> soft:0 ChipID:9 formats:31200
W3D_Driver <Voodoo Avenger> soft:0 ChipID:8 formats:160
4 Driver(s) installed
============================================================
WARNING: You have 4 Warp3D drivers installed !!!
============================================================
W3D_Driver <AMD Radeon HD Southern Islands> soft:0 ChipID:13 formats:18840
============================================================
206 fps
Cow3D6: build Jan 19 2017 at 12:22:36
CheckNova:
============================================================
Current bitmap's destformat is PIXF_A8R8G8B8 (BmFmt:11 PixF:6)
============================================================
============================================================
GPU0: AMD Radeon Southern Islands (W3DN_SI.library)
1 Driver(s) installed
============================================================
============================================================
1558 fps
Spectre660 20/01/2017 11:04pm (8 years ago)
Cow without bar:
plain W3D: 183 fps
Nova: 1286 fps
Spectre660 20/01/2017 11:01pm (8 years ago)
Cow without bar:
plain W3D: 177 fps
Nova: 1131 fps
Rob 20/01/2017 6:28pm (8 years ago)
Warp3D = 209 fps (219 fps when 'b' pressed)
Nova3D = 1200 fps
* Please include the core speed of your video card since it can vary from model to model.
Maverick 20/01/2017 11:18am (8 years ago)
Cow with bar:
plain W3D: 137 fps
Nova: 427 fps
Cow without bar:
plain W3D: 179 fps
Nova: 1308 fps