Comments on: Intel Shoots “Granite Rapids” Xeon 6 Into The Datacenter https://www.nextplatform.com/2024/09/24/intel-shoots-granite-rapids-xeon-6-into-the-datacenter/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Fri, 31 Jan 2025 00:28:44 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: Pierre Martinow https://www.nextplatform.com/2024/09/24/intel-shoots-granite-rapids-xeon-6-into-the-datacenter/#comment-247098 Fri, 31 Jan 2025 00:28:44 +0000 https://www.nextplatform.com/?p=144739#comment-247098 I am looking at the 6980P for inference of DeepSeek’s R1 model, which could conveniently run on a single mode with sufficient DDR5 RAM, compared to having it to load fully into VRAM on one node, which would be more expensive (more VRAM, more GPUs). So I was wondering if the 6980P has a comparable FP16/BP16 performance to current data center accelerators.

]]>
By: Matt Jones https://www.nextplatform.com/2024/09/24/intel-shoots-granite-rapids-xeon-6-into-the-datacenter/#comment-237199 Wed, 09 Oct 2024 23:20:52 +0000 https://www.nextplatform.com/?p=144739#comment-237199 In reply to Matt Jones.

To be more exact, I should have written that on a power constrained chip, the performance ratio between a design that
increases IPC by 5% and power by 0% compared to a design that
increases IPC by 15% and power by 30% is 1.19 because 1.05 / (1.15/1.3) = 1.19 .

It is definitely not true that “on average the two options are about the same in a power-constrained world”, which is what the Senior Intel Fellow was quoted as saying. Perhaps Timothy Prickett Morgan could ask the Senior Intel Fellow if he really meant to say “on average the two options are about the same in a power-constrained world”.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2024/09/24/intel-shoots-granite-rapids-xeon-6-into-the-datacenter/#comment-237181 Wed, 09 Oct 2024 17:08:27 +0000 https://www.nextplatform.com/?p=144739#comment-237181 In reply to Tom Miller.

Thanks, Tom. I did not see that.

]]>
By: Tom Miller https://www.nextplatform.com/2024/09/24/intel-shoots-granite-rapids-xeon-6-into-the-datacenter/#comment-237084 Tue, 08 Oct 2024 10:28:13 +0000 https://www.nextplatform.com/?p=144739#comment-237084 Intel added recommended customer prices for Granite Rapids to their website, which I reproduced below. I also included the peak FP64 performance at the base frequency.

6980P 128 cores $17800 2.0 GHz 500W 8.2 TFLOPs
6979P 120 cores $15750 2.1 GHz 500W 8.1 TFLOPs
6972P 96 cores $14600 2.4 GHz 500W 7.4 TFLOPs
6952P 96 cores $11400 2.1 GHz 400W 6.5 TFLOPs
6960P 72 cores $13750 2.7 GHz 500W 6.2 TFLOPs
MI300X 19456 SPs $15000 2.1 GHz 750W 163.4 TFLOPs

The 96 cores 400W SKU is the version of Granite Rapids with the best FP64 performance per dollar but AMD’s MI300X has 19x better FP64 performance per dollar. The 128 cores 500W SKU is the version of Granite Rapids with the best FP64 performance per Watt but the MI300X has 13x better FP64 performance per Watt and the peak FP64 performance of the MI300X is 20x better. If Intel can’t make a datacenter GPU that customers want to buy, Intel will have to provide the option of on-package floating-point accelerators to narrow this performance gap with GPUs. Diamond Rapids is rumored to have an accelerator tile below each CPU tile, similar to AMD’s 3D V-Cache, but for accelerators. Granite Rapids only has vector instructions for FP64 while datacenter GPUs have both matrix and vector instructions for FP64.

]]>
By: HuMo https://www.nextplatform.com/2024/09/24/intel-shoots-granite-rapids-xeon-6-into-the-datacenter/#comment-235903 Fri, 27 Sep 2024 10:53:54 +0000 https://www.nextplatform.com/?p=144739#comment-235903 In reply to Timothy Prickett Morgan.

Oh yes! And look at that amazing high-flying top-rope-diving double-sledge polish-hammer that sees the Granite Rock’s Mr.DIMM tear down that memory wall, in LULESH, in HPCG, and in Xcompact3D — not to mention the tilt-a-whirl wheelbarrow that AMX does on OpenVino — that 6980P chip’s got some “choice” moves (on top of efficiency)! 8^p ( pages 4,5,10: https://www.phoronix.com/review/intel-xeon-6980p-performance/4 )

]]>
By: Jeff Harris https://www.nextplatform.com/2024/09/24/intel-shoots-granite-rapids-xeon-6-into-the-datacenter/#comment-235891 Fri, 27 Sep 2024 08:13:09 +0000 https://www.nextplatform.com/?p=144739#comment-235891 If you get a chance the next time you are talking with an Intel architect, please ask why Granite Rapids has no Xeon Max version with HBM, like Sapphire Rapids. A processor with both HBM and MRDIMMs would be great for AI, simulation, modeling and other HPC applications.

A future Xeon processor, like Diamond Rapids, might have 16 channels of 12.8 GTransfers/sec MRDIMMs, which would provide a total DRAM bandwidth of 1.6 TBytes/sec. If this future processor has a Xeon Max version with 4 stacks of HBM3E, the total DRAM bandwidth (HBM3E + MRDIMM) would be increased be 4x compared to the MRDIMM-only version.

The recommended customer price for Sapphire Rapids Xeon Max with 64 GBytes of raw HBM was $2K to $3K higher than the price of the processor with DIMMs-only. A future Xeon Max could have 96 GBytes of raw HBM3E with the usable HBM3E capacity reduced by the bits needed for optional ECC. This future Xeon Max could have a customer price of $3K to $5K higher than the MRDIMM-only version. Considering the price of high-end x86 processors, including HBM3E makes economic and technical sense because it would provide a 3x to 4x performance increase in HPC applications for less than a 3x to 4x system price increase.

Sapphire Rapids Xeon Max had some problem that limited the total HBM bandwidth to about 1 TByte/sec. That problem, whatever it is, needs to be fixed on a future Xeon Max processor.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2024/09/24/intel-shoots-granite-rapids-xeon-6-into-the-datacenter/#comment-235829 Thu, 26 Sep 2024 23:15:29 +0000 https://www.nextplatform.com/?p=144739#comment-235829 In reply to Slim Albert.

Same here, and that is what I thought he meant, too.

]]>
By: Slim Albert https://www.nextplatform.com/2024/09/24/intel-shoots-granite-rapids-xeon-6-into-the-datacenter/#comment-235819 Thu, 26 Sep 2024 21:48:15 +0000 https://www.nextplatform.com/?p=144739#comment-235819 In reply to Matt Jones.

Fair point! I interpreted it as Singhal wanting to emphasize that “Granite Rapids [(GR)] focuses more on power reduction in many ways than IPC uplift”. The Phoronix benchmarks on GR power-efficiency seem to bear his point, where, with 128 cores, GR consumed 650 Watts on average (my reading of the “central” vertical lines in their bars), while Xeon Max 9468 (48 cores) and 9480 (56 cores) consumed between 600 and 620 Watts on average (my reading again). And so, GR runs more than twice as many P-cores, giving it at least 1.8x the oomph of the Xeon Maxes, while consuming less than 10% more juice. That gives GR a power consumption per core similar to that of the EPYC 9684X in my reaading — GR has 1.3x as many cores, and consumes 1.3x the power on average. ( https://www.phoronix.com/review/intel-xeon-6980p-power/7 )

]]>
By: HuMo https://www.nextplatform.com/2024/09/24/intel-shoots-granite-rapids-xeon-6-into-the-datacenter/#comment-235710 Thu, 26 Sep 2024 03:25:57 +0000 https://www.nextplatform.com/?p=144739#comment-235710 In reply to HuMo.

Oh, and as for that Sleeping Beauty hearsay, it’s not all secretive mistery hocus-pocus really, smoke and mirrors and all, just hearing between the words in the Argonne interview podcast at InsideHPC, while getting around (round, round) like a youthful Beach Boys surfing the Internets (48-minutes: https://insidehpc.com/2024/09/hpcpodcast-an-aurora-exascale-update-and-other-hpc-topics-with-argonnes-rick-stevens-and-mike-papka/ — worth a listen!) … q^8

]]>
By: Matt Jones https://www.nextplatform.com/2024/09/24/intel-shoots-granite-rapids-xeon-6-into-the-datacenter/#comment-235693 Wed, 25 Sep 2024 23:38:13 +0000 https://www.nextplatform.com/?p=144739#comment-235693 You quote a Senior Intel Fellow and chief architect of Xeon 6 as saying “if my internal team comes to me and offers me a core with 5 percent IPC and a core with 15 percent IPC, which is better for Xeon? The answer is it depends on other parameters, particularly power. If the 5 percent IPC option costs me 0 percent more power but the 15 percent IPC option costs 30 percent more power, then on average the two options are about the same in a power-constrained world and one is likely less complex.”

Those two options are definitely not “about the same in a power-constrained world”. For the second option, the IPC increases by less than the power increases. If the power can’t be increased, the second option would result in a reduction of performance by 1.15/1.3 = .88 because the processor frequency or number of cores would have to be reduced by 12%. The first option results in a 5% increase in performance so the performance difference between the two options on a power-constrained chip is 12% + 5% = 17%. It appears that the Senior Intel Fellow took 30% of 15% to conclude the second option is about the same as a 5% improvement in performance per power, which is completely wrong.

I think the Senior Intel Fellow was trying to make the point that an IPC increase is only beneficial if the IPC increase is more than the power increase on a power-constrained chip. A design with an IPC increase of 1.15x would have to increase power by less than 1.15x to justify the increased design complexity. An IPC increase of 1.15x with a power increase of 1.1x provides a 1.15/1.1 = 1.05x improvement in performance per power. If a simpler design also provides the same 1.05x improvement in performance per power, the simpler design is better.

]]>