Comments on: Diving Deep Into The Nvidia Ampere GPU Architecture

By: emerth

emerth — Wed, 10 Jun 2020 05:55:22 +0000

In reply to BlackDove. Afaik the tensor cores implement less precise math than the shaders.

By: anon

anon — Fri, 29 May 2020 17:05:21 +0000

For clarification, a pcie gen4 x16 should get you 2GB/s per lane x 16 = 32GB/s each direction, ie 64GB/s full duplex…what is the total bandwidth that they support…?

By: BlackDove

BlackDove — Fri, 29 May 2020 12:52:42 +0000

I’ve been wondering: why use the normal FP64 cores when the FP64 tensor units have double the performance? Do some applications just not run on the tensor units?

And also, would the HPL results for systems using the A100 GPU reflect the standard FP64 cores’ perfofmance or the tensor cores’?

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Fri, 29 May 2020 12:12:12 +0000

In reply to Peter Eid. I got a bad steer from someone at Nvidia, and as it turns out, the clock speed and SM extension would yield the same result. Funny, that. Thanks for the help.

By: Peter Eid

Peter Eid — Fri, 29 May 2020 07:55:27 +0000

Nice read!
You pointed out that the Tesla V100S is a fully-enabled GV100, but according to what I could find online, it is not the case, it is still a 5120-CUDA core/80-SM design. See for example, https://www.pny.eu/en/professional/explore-all-products/nvidia-tesla/1279-tesla-v100s-32gb
Finally, the last sentence of the paragraph mentioning the V100S is cut short… (“At some point, that other 8 GB of memory, an increase of”)