Comments on: Ethernet Consortium Shoots For 1 Million Node Clusters That Beat InfiniBand https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Wed, 26 Jun 2024 14:02:18 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: Timothy Prickett Morgan https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/#comment-215605 Fri, 27 Oct 2023 12:31:46 +0000 https://www.nextplatform.com/?p=142670#comment-215605 In reply to Rakesh Cheerla.

Different animals for sure. Maybe we have networks that are being asked to do wildly divergent things, and AI and HPC can’t piggyback on the same interconnect?

]]>
By: Rakesh Cheerla https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/#comment-215548 Thu, 26 Oct 2023 00:28:18 +0000 https://www.nextplatform.com/?p=142670#comment-215548 Timothy Prickett Morgan,
What does 1 million nodes “roughly” translate to in terms of flows for AI (elephant flows) vs. HPC (mice flows). Different animals, right?

]]>
By: Hubert https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/#comment-211589 Mon, 24 Jul 2023 21:28:17 +0000 https://www.nextplatform.com/?p=142670#comment-211589 In reply to Timothy Prickett Morgan.

I stand corrected (thanks to TNP)!

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/#comment-211582 Mon, 24 Jul 2023 19:05:39 +0000 https://www.nextplatform.com/?p=142670#comment-211582 In reply to Luis River.

No, Spectrum-4 is Ethernet. Not InfiniBand. Mellanox has not converged its ASICs since the SwitchX days back in 2011, which it reversed and said it was a bad idea because it increased latency for InfiniBand.

]]>
By: Luis River https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/#comment-211580 Mon, 24 Jul 2023 18:07:17 +0000 https://www.nextplatform.com/?p=142670#comment-211580 Please, Mr.Hubert it is a desinformed guy ?

]]>
By: Adir Zevulun https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/#comment-211551 Mon, 24 Jul 2023 04:50:34 +0000 https://www.nextplatform.com/?p=142670#comment-211551 In reply to Luis river.

Nvidia Spectrum-4 is not IB.
It is 51T Ethernet switch.

]]>
By: ⟨φ|8^p|ψ⟩ https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/#comment-211535 Sun, 23 Jul 2023 18:24:29 +0000 https://www.nextplatform.com/?p=142670#comment-211535 In reply to Hubert.

The Eternal Battle Between InfiniBand And Ethernet In HPC (2021)

]]>
By: Luis river https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/#comment-211516 Sun, 23 Jul 2023 02:09:26 +0000 https://www.nextplatform.com/?p=142670#comment-211516 In reply to Hubert.

IB Spectrum-4 seem very nice !

]]>
By: Hubert https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/#comment-211480 Sat, 22 Jul 2023 06:07:07 +0000 https://www.nextplatform.com/?p=142670#comment-211480 In reply to Mark Hahn.

I think that the 2019 Slingshot article (linked in the 3rd paragraph) nicely supports your point — essentially showing that an eth-oriented approach can sustain performant RDMA/RoCE and tail-cutting congestion management. But it did take time and effort to develop and demonstrate, and so, beforehand, niche had to be the law of the land as it were.

Meanwhile, in the “Look mom, no InifiniBand” El Reg article of 05/29/23 we read that: “at COMPUTEX Huang announced the SPECTRUM-4 […] switch that marries Ethernet and InfiniBand, with a 400GB/s BlueField 3 SmartNIC” (also in TNP, eg. searching for Bluefield) — and so there’s competitve impetus at nV as well to evolve IB in interesting ways.

]]>
By: Mark Hahn https://www.nextplatform.com/2023/07/20/ethernet-consortium-shoots-for-1-million-node-clusters-that-beat-infiniband/#comment-211452 Fri, 21 Jul 2023 18:44:23 +0000 https://www.nextplatform.com/?p=142670#comment-211452 Single-source is a critical flaw, especially now that it’s part of a vertically-integrated company. And to be frank, what has IB done for us lately? Sure, phy designers have provided the means for IB to step up bandwidth.

There have been a lot of “X challenges IB” stories, but the fact is that IB remains quite niche. That is, those challengers have managed to limit IB’s TAM and thus overall importance.

I wonder if everyone would agree that IB’s choice to ignore eth compatibility was a historic mistake.

]]>