Seems like an odd detail. Is it safe to assume that this language is meant for i...

Culonavirus · on July 3, 2023

Probably a case of "no one complained hard enough yet", also probably a case of "beggars can't be choosers"... when literally billions of VC money is poured into both AI startups and estabilished cloud computing companies, all of that flows directly into Nvidia's pockets and it's not like this area is free from vendor lock in. When you have a stack that requires Nvidia's hardware you are going to pay for Nvidia's hardware. We live in a time when any hardware with a "matrix calculation accelerator" label sells like hot cakes. It's a massive bubble (HN doesn't like when you use that word in combination with AI, but that's what it is), but as with any bubble, people don't care, they want to ride that wave while it's there. But to get back to the issue, anything Nvidia will sell right now, it's just a matter of who is going to be able to buy it first. So no one really complains about some of Nvidia's marketing being a little dishonest. Also, even if people cared, being a trillion dollar company on your way to being one of the most valued companies on the planet, you have a lot of options and money for litigation.

kklimonda · on July 3, 2023

It's always been like this in GPU space - all reviews have always mentioned number of compute units (be it SMS or "cuda cores"), and the total available for the given architecture is also known. A lot can be told about relative performance of two cards based on that, so this information is useful not only to the investors.

rob74 · on July 3, 2023

AFAIK it's been like that in CPU space too - e.g. that 6-core CPUs are actually 8-core CPUs with 2 cores deactivated, either because of defects or because they needed more 6-core CPUs?

numpad0 · on July 3, 2023

It's always like that in consumer semiconductors. Intel has something like 3 to 5 actual silicon variants per generation that covers all dozen or two SKUs.

pclmulqdq · on July 3, 2023

This sort of yield-enhancement-by-binning extends to almost every form of semiconductor, from amplifiers to server CPUs.

nomel · on July 3, 2023

Sure, but Intel doesn't advertise the number of dead cores.

twic · on July 3, 2023

The article says:

> We’re testing H100’s PCIe version on Lambda Cloud, which enables 114 of those SMs, 50 MB of L2 cache, and 10 HBM2 memory controllers. The card can draw up to 350 W.

> Nvidia also offers a SXM form factor H100, which can draw up to 700W and has 132 SMs enabled.

So i wonder if the number of enabled elements is due to a power supply or cooling constraint.

raverbashing · on July 3, 2023

Very possible, including some PCI limits (even though you probably have an auxiliary port)

It's also possible the yields are not so great and then you have a limited number of good SMs per chip

formerly_proven · on July 3, 2023

H100 PCIe has 114/144 SMs enabled

H100 SXM5 has 132/144 SMs enabled. Also higher clocks, much higher TDP.

nomel · on July 3, 2023

My confusion is to why the "/144" is needed there, rather than just the lone numbers, 114 and 132, especially since the missing pieces are, more than likely, defective. How can knowing this number help anyone? Perhaps it's transparency, "132 is the best we have now. Best possible is 144, so don't save your money, buy this one!"

formerly_proven · on July 4, 2023

It's typically mentioned as a yield or product placement reference, because the same silicon is often used for a range of models and if the full line-up isn't released yet the number of disabled units gives hints as to which other products are likely to exist, how they perform and priced.

pclmulqdq · on July 3, 2023

The fusing is supposedly due to the power envelope on the PCIe cards. Practically, it could be a market segmentation / yield enhancement trick, but that would be the most nefarious thing, but I assume it's also mostly due to the power envelope.