Hacker Newsnew | past | comments | ask | show | jobs | submit | elsombrero's commentslogin

not good enough for coding anything more than simple scripts.

generally, the less parameters, the less knowledge they have.


A custom provider for kubernetes cluster autoscaler for homelabs that lets you turn on and off the nodes without reprovisioning them.

https://github.com/homecluster-dev/homelab-autoscaler

https://autoscaler.homecluster.dev

Works with any mechanism to turn on and off nodes(IPMI, WoL...) I have some nodes that I turn on and off via a curl to homeassistant to the power plug.


On my 2x 3090s I am running glm4.5 air q1 and it runs at ~300pp and 20/30 tk/s works pretty well with roo code on vscode, rarely misses tool calls and produces decent quality code.

I also tried to use it with claude code with claude code router and it's pretty fast. Roo code uses bigger contexts, so it's quite slower than claude code in general, but I like the workflow better.

this is my snippet for llama-swap

``` models: "glm45-air": healthCheckTimeout: 300 cmd: | llama.cpp/build/bin/llama-server -hf unsloth/GLM-4.5-Air-GGUF:IQ1_M --split-mode layer --tensor-split 0.48,0.52 --flash-attn on -c 82000 --ubatch-size 512 --cache-type-k q4_1 --cache-type-v q4_1 -ngl 99 --threads -1 --port ${PORT} --host 0.0.0.0 --no-mmap -hfd mradermacher/GLM-4.5-DRAFT-0.6B-v3.0-i1-GGUF:Q6_K -ngld 99 --kv-unified ```


Thanks, but I find it hard to believe that a Q1 model would produce decent results.

I see that the Q2 version is around 42GB, which might be doable on 2x 3090s, even if some of it spills over to CPU/RAM. Have you tried Q2?


well, I tried it and it works for me. llm output is hard to properly evaluate without actually using it.

I read a lot of good comments on r/localllama, with most people suggesting qwen3 coder 30ba3b, but I never got it to work as well as GLM 4.5 air Q1.

As for using Q2, it will fit in vram, but with very small context or spill over to RAM, but with quite an impact on speed depending on your setup. I have slow ddr4 ram and going for Q1 has been a good compromise for me, but YMMV.


What is llama-swap?

Been looking for more details about software configs on https://llamabuilds.ai


https://github.com/mostlygeek/llama-swap

it's a transparent proxy that automatically launches your selected model with your preferred inference server so that you don't need to manually start/stop the server when you want to switch model

so, let's say I have configured roo code to use qwen3 30ba3b as the orchestrator and glm4.5 air as coder, roo code would call the proxy server with model "qwen3" when using orchestrator mode and then kill llama.cpp with qwen3 and restart it with "glm4.5air"


you could apply a binary search for each slider and improve the number of tries by moving the slider by half of the shortest distance to the edge




In the installation page they just say to download the prebuilt binaries for your system, no need to compile it your self


I have a OnePlus 8t since last october and have not found any issue on the battery side. On the contrary I'm quite pleased with battery management since i can charge it in 30 minutes from 0 to almost 100%, so usually i just charge it for 5-10 minutes and I can use it all day easily.

With full charge on average I get around a day or two without recharging with mostly firefox, youtube, messaging apps open and some light gaming.

Battery saving mode gets me almost 1 hour more of usage when I'm at 15%

Haven't really had any signal problems and 5g isn't that much of an improvement around where I live so it's not really a concern for me but ymmv


Sure, negotiating your salary is an important skill, but I don't see the value of this article over just the title.

And then there's this

> Here’s what you do:

>

> Read this book.

> Do what it says.

Bad self-help marketing fluff piece without any substance to it...


and fifty bucks too, maybe it's a lesson


usually you may catch a general exception and throw another one that's caught up the call stack. In this case i think it may be useful for logging additional causes that are not going to be obvious with just the stacktrace(?)


Link to the original paper

https://arxiv.org/abs/1808.10250


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: