> By the time you've shoehorned in an nVidia GPU and all that RAM, you're easily...

stego-tech · on March 5, 2025

You're forgetting what Apple's been baking into their silicon for (nearly? over?) a decade: the Neural Processing Unit (NPU), now called the "Neural Engine". That's their secret sauce that makes their kit more competitive for endpoint and edge inference than standard x86 CPUs. It's why I can get similarly satisfying performance on my old M1 Pro Macbook Pro with a scant 16GB of memory as I can on my 10900k w/ 64GB RAM and an RTX 3090 under the hood. Just to put these two into context, I ran the latest version of LM Studio with the deepseek-r1-distill-llama-8b model @ Q8_0, both with the exact same prompt and maximally offloaded onto hardware acceleration and memory, with a context window that was entirely empty:

  Write me an AWS CloudFormation file that does the following:
  
  * Deploys an Amazon Kubernetes Cluster
  * Deploys Busybox in the namespace "Test1", including creating that Namespace
  * Deploys a second Busybox in the namespace "Test3", including creating that Namespace
  * Creates a PVC for 60GB of storage

The M1Pro laptop with 16GB of Unified Memory:

  * 21.28 seconds for "Thinking"
  * 0.22s to the first token
  * 18.65 tokens/second over 1484 tokens in its responses
  * 1m:23s from sending the input to completion of the output

The 10900k CPU, with 64GB of RAM and a full-fat RTX 3090 GPU in it:

  * 10.88 seconds for "thinking"
  * 0.04s to first token
  * 58.02 tokens/second over 1905 tokens in its responses
  * 0m:34s from sending the input to completion of the output

Same model, same loader, different architectures and resources. This is why a lot of the AI crowd are on Macs: their chip designs, especially the Neural Engine and GPUs, allow quite competent edge inference while sipping comparative thimbles of energy. It's why if I were all-in on LLMs or leveraged them for work more often (which I intend to, given how I'm currently selling my generalist expertise to potential employers), I'd be seriously eyeballing these little Mac Studios for their local inference capabilities.

kllrnohj · on March 5, 2025

Uh.... I must be missing something here, because you're hyping up Apple's NPU only to show it getting absolutely obliterated by the equally old 3090? Your 10900K having 64gb of RAM is also irrelevant here...

stego-tech · on March 5, 2025

You're missing the the bigger picture by getting bogged down in technical details. To an end user, the difference between thirty seconds and ninety seconds is often irrelevant for things like AI, where they expect a delay while it "thinks". When taken in that context, you're now comparing a 14" laptop running off its battery, to a desktop rig gulping down ~500W according to my UPS, for a mere 66% reduction in runtime for a single query at the expense of 5x the power draw.

Sure, the desktop machine performs better, as would a datacenter server jam-packed full of Blackwell GPUs, but that's not what's exciting about Apple's implementation. It's the efficiency of it all, being able to handle modern models on comparatively "weaker" hardware most folks would dismiss outright. That's the point I was trying to make.

kllrnohj · on March 5, 2025

We're talking about the m3 ultra here, which is also wall powered and also expensive. Nobody is interested in dropping upwards of $10,000 on a Mac Studio to have "okay" performance just because an unrelated product is battery powered. Similarly saving a few bucks on electricity to triple the time the much, much more expensive engineer time spent waiting on results is foolish

Also Apple isn't unique in having an NPU in a laptop. Fucking everyone does at this point.

stego-tech · on March 6, 2025

It almost feels like you're deliberately missing the forest for the trees, in order to fit some argument that I'm not quite able to sus out here.

The point is that, in terms of practical usage, the M3 Ultra is uniquely competent and highly affordable in a sea of enterprise technology that is decidedly not. I tried to demonstrate why I'm excited about it by pointing out the similar performance of a battery-powered, four-year-old laptop and a quite gargantuan gaming PC that's pulling over 500W from the wall, as an example of what several years of additional refinements and improvements to the architecture was expected to bring.

The point is that it's affordable, more flexible in deployment, and more efficient than similarly-specced datacenter servers specifically designed for inference. For the cost of a single decked-out Dell or HP rackmount server, I can have five of these Mac Studios with M3 Ultra chips - and without the need for substantial cooling, noise isolation, or other datacenter necessities. If the marketing copy is even in the same ballpark as actual performance, that's easily enough inference to serve an office of fifty to a hundred people or more, depending on latency tolerances; if you don't mind "queuing" work (like CurrentCo does with their internal Agents), one of those is likely enough for a hundred users.

That's the excitement. That's the point. It's not the fastest, it's not the cheapest, it's just the most balanced.

seec · on March 6, 2025

Apple defenders have some special sauce reasoning that makes no sense to anyone but them. Are you a boomer?

I have Apple hardware but it sucks for anything AI, buying it for that purpose is just extremely dumb, just like buying Macs for engineering CADs or things of the sort.

If you are buying Macs and it's not for media production related reasons you are doing something wrong.

stego-tech · on March 7, 2025

> Apple defenders have some special sauce reasoning that makes no sense to anyone but them. Are you a boomer?

I continue to be in awe of the lengths some people will go just to fling insults and shake out some salt. We're, what, ten layers deep? With all the context above, the best you have to contribute to the discussion are baseless accusations and ageist insults?

Your finite time would have been better spent on literally anything else, than actively seeking out a comment just to throw subjective, unsubstantiated shade around. C'mon, be better.

seec · on March 10, 2025

Makes no mistake, it's not an insult. I'm saying that precisely because I have been there.

Apple is the master at creating desire and building narrative in their customers' mind about the many things their devices would allow them to do. It's very aspirational and in practice most of the Macs get used for things that could have been done with a much cheaper option.

It may not be obvious to you but it's somewhat funny seeing you rationalise all kinds of dreams of what this machine could potentially be when in practice the people who would really be working on the kind of stuff you are talking about don't even consider them viable for many good reasons.

It's not that those machines cannot potentially do it, it's just that they don't really fit the goal very well.

A lot like people buying Cybertruck to "haul" stuff when they are a lot more option that are just plain better and make a lot more economic/practical sense.

It's OK to desire the thing and be excited about it but it really doesn't serve anyone to rationalise it so hard, you are lying to yourself as much as everyone else, it's not healthy.

If that was not clear, people working on AI stuff professionally really don't have to deal with a Mac Studio, they have access to better stuff. If you want to get one personally to experiment/toy around it's ok but it's not going to be this amazing thing for AI.

kiratp · on March 5, 2025

10K doesn’t get you 512 GB of VRAM in Nvidia land.

sgt101 · on March 6, 2025

indeed, it does not.

I am thinking that 7 A100's would be the lowest price for that, and that would be $80k with good discounts.

rbanffy · on March 5, 2025

Had the M3 GPU been much wider, it would be constrained by the memory bandwidth. It might still have an advantage over Nvidia competitors in that it has 512GB accessible to it and will need to push less memory across socket boundaries.

It all depends on the workload you want to run.