Comments on: The Hidden Cost Of Compromise: Why HPC Still Demands Precision

By: Calamity Jim

Calamity Jim — Sun, 16 Feb 2025 05:08:30 +0000

Quite so! This could be quite useful to help navigate the environmental regulation rodeo as explained by BASF when presenting their new 3 PetaFlop/s Quriosity — “the world’s largest supercomputer used in industrial chemical research”. Those reconfigurable FP64 NextSilicon-branded Maverick-2 chips could get them to assess the “Potential impact of crop protection products on groundwater quality” in even less than a few hours (instead of years), or through more complex environment models: https://www.basf.com/global/en/who-we-are/innovation/how-we-innovate/our-RnD/Digitalization_in_R-D/supercomputer

I reckon it’d be quite a sight to see the rancheros at Corteva (Dow/Elanco-Dupont), Bayer-Monsanto, and Syngenta, down the Arbuckle and saddle-up to join this here computational steer wrestling contest, for everyone’s improved environmental protection, with great crop production!

By: Slim Albert

Slim Albert — Sat, 15 Feb 2025 01:59:16 +0000

A nice addition to Tim’s recent article on many tech details of Maverick-2 ( https://www.nextplatform.com/2024/10/29/hpc-gets-a-reconfigurable-dataflow-engine-to-take-on-cpus-and-gpus/ )!

Also, thanks for linking to “A recent study from Oak Ridge National Laboratory” where Table 3 shows that a 10x speedup with MxP only occurs (so far at least) for dense matrix situations, while the more relevant sparse matrix computations may see just 1.5x (eg. CFDand Climate/Weather in their Table 2, and I expect contaminant transport and environmental flows, unfortunately).

Maverick-2’s mill cores that automatically identify computational hotspots and optimize the corresponding compute graphs (reducing data movement overhead among others) sounds like quite the ticket imho. I like the 4x better perf-per-watt vs GPUs and hope it gets realized in sparse computations (eg. both direct and iterative solvers). In particular, I expect the distributed HBM3E approach, with dynamic reconfiguration of compute, to help out with the HPCG and Graph500 memory access challenges (here in HPC-relevant FP64, rather than the more common FP32 of dataflow devices).

Looking forward to seeing some conference presentations or papers related to Sandia’s NNSA Vanguard Penguin Tundra testing of this quite promising ASIC ( https://www.sandia.gov/research/news/sandia-partners-with-nextsilicon-and-penguin-solutions-to-deliver-first-of-its-kind-runtime-reconfigurable-accelerator-technology/ )!

By: Mike Harris

Mike Harris — Fri, 14 Feb 2025 12:48:22 +0000

I have some friendly suggestions for NextSilicon. There should be a liquid-cooled version of the Maverick-2 PCIe card that is compatible with Supermicro’s liquid-cooled workstation. When NextSilicon is ready to release the specifications of Maverick-2, add a “Tech Specs” tab to the NextSilicon website so potential customers can quickly find the specs of this product without any marketing fluff. Maverick-2 should support Compute Express Link Type 2 (CXL Type 2) to provide cache coherent access to host memory and the HBM on Maverick-2. NextSilicon should have a Maverick card supporting PCIe Gen 6 and CXL for sale by 2026 to align with the launch of AMD’s Venice processor and Intel’s Diamond Rapids processor.

A cloud service should be created that allows potential customers to easily upload source code and evaluate the performance of Maverick-2. The runtime limit could be 60 seconds. The uploaded source code should be prevented from accessing the internet or the local file system. The uploaded source code should be able to print a maximum of 5000 characters. Potential customers would submit a job to a queue and see the results on the website when the job completes. No registration or email address should be required.

The NextSilicon website should have detailed documentation describing the microarchitecture of Maverick-2, especially the memory subsystem. The latency, bandwidth and size of each level in the memory hierarchy should be listed. Techniques for optimizing the performance of Maverick-2 should be described. When Maverick-2 becomes available for sale, there should be links on the NextSilicon website to webpages where the product can be bought.

NextSilicon should work with HPC software providers, like Q-Chem, to get their applications running on Maverick-2. The NextSilicon website should show the performance of these real-world HPC applications on Maverick-2 and alternative platforms. Maverick-2 should provide at least a 3x better price/performance ratio on real-world HPC applications than the best available NVIDIA GPUs and x86 CPUs.

By: Eric Olson

Eric Olson — Fri, 14 Feb 2025 04:05:43 +0000

It’s a good point that the double precision used for scientific computation is getting overshadowed by quarter and half precision AI hardware. The dream of a plug in card that automatically accelerates Fortran and C codes with CUDA and ROCm compatibility planned sounds wonderful.

Where are the benchmarks, feeds and speeds?

For example it would be interesting to see to what extent, if any, SPECfp benefits from Maverick-2 acceleration.