I don’t know too much about it, but from the people that do, these things are ultra specialized and essentially worthless for anything other than AI type work:
anything post-Volta is literally worse than worthless for any workload that isn’t lossy low-precision matrix bullshit.
H200’s can’t achieve the claimed 30TF at FP64, which is a less than 5% gain over the H100. FP32 gains are similarly abysmal.
The B100 and B200? <30TF FP64.
Contrast with AMD Instinct MI200 @ 22TF FP64, and MI325X at 81.72TF for both FP32 and FP64. But 653.7TF for FP16 lossy matrix. More usable by far, but still BAD numbers. VERY bad.
AI isn’t even the first or the twentieth use case for those operations.
All the “FP” quotes are about floating point precision, which matters more for training and finely detailed models, especially FP64. Integer based matrix math comes up plenty often in optimized cases, which are becoming more and more the norm, especially with China’s research on shrinking models while retaining accuracy metrics.
I don’t know too much about it, but from the people that do, these things are ultra specialized and essentially worthless for anything other than AI type work:
https://weird.autos/@rootwyrm/115361368946190474
AI isn’t even the first or the twentieth use case for those operations.
All the “FP” quotes are about floating point precision, which matters more for training and finely detailed models, especially FP64. Integer based matrix math comes up plenty often in optimized cases, which are becoming more and more the norm, especially with China’s research on shrinking models while retaining accuracy metrics.