More

maartenh · 2026-04-08T13:15:36 1775654136

Same here, surprised that only you mentioned it here.

maartenh · 2026-03-18T20:37:48 1773866268

Elections are happening soon (April). It's not clear at all Orban will win this round of elections, see e.g. https://apnews.com/article/hungary-orban-magyar-rival-rallie...

maartenh · 2026-01-27T21:24:19 1769549059

Curious to how well LLM's work in this context! (mentioned as one of the reasons to embed CAD in Rust)

I only know of another text -> STL AI model, I'm quite a bit more excited about this idea.

Does someone have experience with this?

storystarling · 2026-01-27T22:17:57 1769552277

I've found LLMs perform surprisingly well here if you target CSG or OpenSCAD. It seems to frame the 3D modeling challenge as a logic and syntax problem rather than a spatial one, which plays to the model's strengths. You avoid the spatial hallucinations common in image generation because it's effectively just writing code.

maartenh · 2025-12-27T13:39:25 1766842765

Nice! You might want to fix your GitHub link in the footer though, it 404's for me right now :)

matt-p · 2025-12-27T13:57:58 1766843878

Thanks! Sorry, looks like I made the repo private at some point I'll take a look later but for now I've fixed the link.

maartenh · 2025-11-26T23:09:43 1764198583

How much VRAM would this require, if I would want to run this locally?

I bought a 12GB Nvidia card a year ago. In general I'm having a hard time to find the actual required hardware specs for any self hosted AI model. Any tips/suggestions/recommended resources for that?

nsingh2 · 2025-11-26T23:13:44 1764198824

One quick way to estimate a lower bound is to take the number of parameters and multiply it with the bits per parameter. So a model with 7 billion parameters running with float8 types would be ~7 GB to load at a minimum. The attention mechanism would require more on top of that, and depends on the size of the context window.

You'll also need to load inputs (images in this case) onto the GPU memory, and that depends on the image resolution and batch size.

selcuka · 2025-11-26T23:54:24 1764201264

I use LMStudio for running models locally (macOS) and it tries to estimate whether the model would fit in my GPU memory (which is the same thing as main memory for Macs).

The Q4_K_S quantized version of Microsoft Fara 7B is a 5.8GB download. I'm pretty sure it would work on a 12GB Nvidia card. Even the Q8 one (9.5GB) could work.

BoredomIsFun · 2025-11-27T13:34:36 1764250476

12GiB card not GB. Extra tail compounds to extra 800 MB.

selcuka · 2025-11-27T23:45:22 1764287122

Fair, but the download sizes given above are also in GiB.

Also these calculations are very approximate anyway. The 6.67% difference will not change the fact that 5.8 << 12.

BoredomIsFun · 2025-11-28T05:02:23 1764306143

No file sizes normally given in raw bytes. I've downloaded dozens of models from huggingface, and the difference was always favouring the VRAM size in GiB.

daemonologist · 2025-11-26T23:55:33 1764201333

12GB will be sufficient to run a quantized version, provided you're not running anything else memory-hungry on the GPU.

You're not finding hardware specs because there are a lot of variables at play - the degree to which the weights are quantized, how much space you want to set aside for the KV cache, extra memory needed for multimodal features, etc.

My rule of thumb is 1 byte per parameter to be comfortable (running a quantization with somewhere between 4.5 and 6 bits per parameter and leaving some room for the cache and extras), so 7 GB for 7 billion parameters. If you need a really large context you'll need more; if you want to push it you can get away with a little less.

samus · 2025-11-27T23:20:40 1764285640

There aren't any because it depends a lot on what your use case is, what speed you expect, how accurate you want it to run, how many users want to use it, and how much context size you need.

- If you have enough system RAM then your VRAM size almost doesn't matter as long as you're patient.

- For most models, running them at 16bit precision is a waste, unless you're fine-tuning. The difference to Q8 is negligible, Q6 is still very faithful. In return, they need less memory and get faster.

- Users obviously need to share computing resources with each other. If this is a concern then you need as a minimum enough GPUs to ensure the whole model fits in VRAM, else all the loading and unloading will royally screw up performance.

- Maximum context length is crucial to think about since it has to be stored in memory as well, preferably in VRAM. Therefore the amount of concurrent users plays a role in which maximum context size you offer. But it is also possible to offload it to system RAM or to quantize it.

Rule of thumb: budget 1.5*s where s is the model size at the quantization level you're using. Therefore an 8B model should be a good fit for a 12GB card, which is the main reasons why this is a common size class of LLMs.

rahimnathwani · 2025-11-27T15:33:50 1764257630

The model is 17GB, so you'd need 24GB VRAM:

https://huggingface.co/microsoft/Fara-7B/tree/main

If you want to find models which fit on your GPU, the easiest way is probably going to ollama.com/library

For a general purpose model, try this one, which should fit on your card:

https://ollama.com/library/gemma3:12b

If that doesn't work, the 4b version will definitely work.

jillesvangurp · 2025-11-27T09:43:47 1764236627

It's a good reason to use macs as they have unified ram. I have a 48GB mac book pro. Plenty of memory to run these models. And the M4 Max should be plenty fast. You kind of want to have enough ram that you have plenty left to run your normal stuff after the model has loaded.

I wish I had more time to play with this stuff. It's so hard to keep up with all this.

baq · 2025-11-27T06:24:46 1764224686

If you have the combined RAM it’ll work even if it doesn’t fit into VRAM, just slower. A 7B model like this one might actually be fast enough.

maartenh · 2025-10-19T21:10:21 1760908221

Ah, 5x? At $WORK, the low code tool vendor that is used to build the monolith (and that of our sister company) is bought by a private equity firm. Our sister company will face a 7x increase. Another fun thing is that the license is based on a percentage of licensing cost to their customers.

Their game is clearly to squeeze very hard for a few years, and then deprecate the product. I can't imagine that there are companies that are fine with such price hikes.

maartenh · 2025-06-06T06:07:20 1749190040

Yep. I do create fresh db's from a fixture db using postgres's ability to create a database from a template. Very quick, always correct.

maartenh · on Jan 23, 2025

Wow. In the EU, canceling subscriptions must be as easy as signing up for them by law.

My personal longest issue with ISP's was when the software config once went wrong in their side, took me a month and allmost daily phone calls until I got to 4th line support that was an actual techie who fixed it in 10 minutes.

maartenh · on Dec 9, 2024

Which tool did you use to create that busy_spec.html file? They remind me of Engelbart's blue numbering system for documents, if I remember the name correctly.

Rochus · on Dec 9, 2024

It's https://github.com/rochus-keller/crossline/, a tool which I implemented and used for many years in my projects. It's inspired by Netmanage Ecco and implements features which can also be found in Ted Nelson's Xanadu or in Ivar Jacobson's Objectory.

maartenh · on Nov 7, 2024

Very nice! I talked (~10y ago) at a Nix meetup to a company that did over the air updates of bicycle shed usage signs in NL that ran on embedded linux systems. One of their bigger challenges was that their downlink quota was very limited. I suggested them to peek at bsdiff, not sure if they ever got to it (or maybe would have gotten a better downlink ;))

I might give this idea another go myself with this nice rust library. With some heuristics one could partition the recursive closure of dependencies in a way optimizes for reuse (e.g. try to compute shared subtrees). Probably more efficient than a random tar of e.g. the entire root file system of a buildroot android system.