Comments on: Fujitsu To Fork Arm Server Chip Line To Chase Clouds https://www.nextplatform.com/2023/03/15/fujitsu-to-fork-arm-server-chip-line-to-chase-clouds/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Thu, 23 Mar 2023 05:59:13 +0000 hourly 1 https://wordpress.org/?v=6.7.1 By: HuMo https://www.nextplatform.com/2023/03/15/fujitsu-to-fork-arm-server-chip-line-to-chase-clouds/#comment-205998 Sat, 18 Mar 2023 14:11:46 +0000 https://www.nextplatform.com/?p=142061#comment-205998 In reply to HuMo.

This (above) also echoes Eric’s comment on “Intel Pushes Out Hybrid CPU-GPU […]” (credit where credit is due).

]]>
By: HuMo https://www.nextplatform.com/2023/03/15/fujitsu-to-fork-arm-server-chip-line-to-chase-clouds/#comment-205929 Thu, 16 Mar 2023 18:48:34 +0000 https://www.nextplatform.com/?p=142061#comment-205929 A forked approach to progress from A64FX seems reasonable, with the hope of continued cross-fertilization, and neither branch ending up overly “less traveled”. As Paul mentioned some time ago, A64FX is quite elegant in enabling a single ISA to be used for both scalar and vector ops in the CPU (no need to program an external group of GPUs), for example (ARM assembly with SVE extensions):

ld1d z1.d, p0/z, [x0, x4, lsl #3] // load/gather vect from mem into z1
fmla z2.d, p0/m, z1.d, z0.d // vect mult z0 & z1, accum into z2
st1d z2.d, p0, [x1, x4, lsl #3] // store/scatter vect z2 back to mem

This natively integrated ability for vector-scatter/gather and vector-ops gives A64FX (or Fugaku really) its advantage in HPCG’s sparse matrix memory access kung-fu (fujitsu’s jui-jitsu?) relative to split CPU-GPU architectures. AVX-512 may also do this, of course, for the relevant alternative arch. For dense matrix karate of HPL (simple block-like mem access) the CPU+GPU wins out owing to the much greater oomph provided by its very numerous vector/matrix computational engines. So, the fork leading to Fugaku-Next may consider pairing A64FX with NEC vector processors (of previous Next Platform articles) to improve on that, especially if they do mixed precision well (Fugaku is 3rd on HPL-MxP, behind the much smaller LUMI). Meanwhile, the future stacked caches mentioned here should help it maintain its top spot on HPCG (most relevant to FD/FEM/CFD), against the split competition.

Ironically for ARM’s “power-sipping” reputation, the Fugaku’s 60MJ/EF, while acceptable I guess, is beyond Frontier’s 20MJ/EF (and Grace+Hopper’s possibly 15MJ/EF). It looks like the Monaka fork could bring this down to a much more attractive 18MJ/EF (60/1.7/2.0), but, long-term, additional precise slicing of the power budget, as per Zatoichi’s most awesome blind swordsmanship, will likely be required (proverbial knife, for either fork). Speaking of slicing, there is a whiff of administrative seppuku in the French government this evening (PM), as protesters metaphorically celebrate Dr. Antoine Louis’ 300th anniversary, in a “contre nous de la tyrannie” response, to forced passage of the pension reform. Irrespective, I would love to see deeper collaboration between Fujitsu-RIKEN and the musketeers of EuroHPC on this promising architecture, that already proved its worth — focused on sufficiently distant and uplifting goals, such as 10EF and 100EF.

]]>