Hacker Newsnew | past | comments | ask | show | jobs | submit | brcmthrowaway's commentslogin

Unless its gpu

How much of LLM improvement comes from regular ChatGPT usage these days?

Youre just chatting yourself out of a job.

If we don't need plasma physicists anymore then we probably have fusion reactors or something, which seems like a fine trade. (In reality we're going to want humans in the loop for for the forseeable future)

Giving the right answer: $1

Asking the right question: $9,999


Exactly. Wikipedia should be used on ipfs

Nothing on volumetrics.

Well, do I ever have a treat for ya!

https://voxel.wiki/wiki/references/



Intel contributes to Linux, how is this a problem?

Wrong level of abstraction. NUMA is an additional layer. If the program (script, whatever) was written with a monolithic CPU in mind then the big picture logic won't account for the new details. The kernel can't magically add information it doesn't have (although it does try its best).

Given current trends I think we're eventually going to be forced to adopt new programming paradigms. At some point it will probably make sense to treat on-die HBM distinctly from local RAM and that's in addition to the increasing number of NUMA nodes.


Yes exactly.

The kernel tries to guess as well as it can though - many years ago I hit a fun bug in the kernel scheduler that was triggered by numa process migration ie the kernel would move the processes to the core closest to the ram. It happened that in some cases the migrated processes never got scheduled and got stuck forever.

Disabling numa migration removed the problem. I figured out the issue because of the excellent ‘a decade of wasted cores’ paper which essentially said that on ‘big’ machines like ours funky things could happen scheduling wise so started looking at scheduling settings .

The main numa-pinning performance issue I was describing was different though, and like you said came from us needing to change the way the code was written to account for the distance to ram stick. Modern servers will usually let you choose from fully managed ( hope and pray , single zone ) to many zones, and the depending on what you’ve chosen to expose, use it in your code. As always, benchmark benchmarks.


Guessing this is especially hard to automate with peripherals involved. I once had a workload slow severely because it was running on the NUMA node that didn't share memory with the NIC.

Isn't high grade SSD storage pretty much a memory layer as well these days as the difference is no longer several orders of magnitude in access time and thoughput but only one or two (compared to tha last layer of memory)?

Optane was supposed to fill the gap but Intel never found a market for this.

Flash is still extremely slow compared to ram, including modern flash, especially in a world where ram is already very slow and your cpu already keeps waiting for it.

That being said, you should consider ram/flash/spinning to be all part of a storage hierarchy with different constants and tradeoffs ( volatile or not, big or small , fast or slow etc ), and knowing these tradeoffs will help you design simpler and better systems.


Sort of? Relative to 6 or more channels of RAM it's still quite abysmal but perhaps high bandwidth flash will change how things are done.

Often the Linux scheduling improvements come a year or two after the chip. Also, Linux makes moment-by-moment scheduling and allocation decisions that are unaware of the big picture of workload requirements.

Does anyone remember FRAPS?

What is dumb zone?

When the LLMs start compacting they summarize the conversation up to that point using various techniques. Overall a lot of maybe finer points of the work goes missing and can only be retrieved by the LLM being told to search for it explicitly in old logs.

Once you compact, you've thrown away a lot of relevant tokens from your problem solving and they do become significantly dumber as a result. If I see a compaction coming soon I ask it to write a letter to its future self, and then start a new session by having it read the letter.

There are some days where I let the same session compact 4-5 times and just use the letter to future self method to keep it going with enough context because resetting context also resets my brain :)

If you're ever curious in Claude once you compact you can read the new initial prompt after compaction and see how severe it gets cut down. It's very informative of what it forgets and deems not important. For example I have some internal CLIs that are horribly documented so Claude has to try a few flags a few times to figure out specifics and those corrections always get thrown away and it has to relearn them next time it wants to use the CLI. If you notice things like that happening constantly, my move is to codify those things into my CLAUDE.md or lately I've been making a small script or MCP server to run very specific flags of stuff.


Shouldn't compaction be exactly that letter to its future self?

Look at the compaction prompt yourself. It's in my opinion way too short. (I'm running on Opus 4.5 most of the time at work)

From what my colleague explained to me and I haven't 100% verified it myself is that the beginning and end of the window is the most important to the compaction summary so a lot of the finer details and debugging that will slow down the next session get dropped.


What prompt do you use for the letter-to-self? I've been trying that technique myself to manually reset context without losing the important parts (e.g. when it has barked up the wrong tree and I'm sensing that misstep might influence its current generation in a pathological way), but I've not had much success.

It tends to be pretty manual. I mention the goal of the next session, the current stage of progress, the tests for the next steps, and any skills I want it to load next time.

Having a specific goal seems to make a big difference vs. asking it to summarize the session.


If the session was something where it struggled and had to do multiple attempts I have it write about 'gotchas' or anything it had to attempt multiple times.

The letters are usually more detailed than what I see in the compacted prompt.


So you use the letter to itself in addition to the compacted context? I am curious what you ask it to include in the letter and how it is different from a custom instruction passed to /compact?

> I ask it to write a letter to its future self, and then start a new session by having it read the letter

Is that not one kf the primary technologies for compactification?


You should do your own experiment when you see compaction about to start use the end of your window to have it write one first, and then let the session compact and compare. I was surprised by how small the compact message is.

When I tell it to write a letter to itself I usually phrase it.

'write a letter to yourself Make notes of any gotchas or any quirks that you learned and make sure to note them down.'

It does get those into the letter but if you check compaction a lot of it is gone.


I think the point is that you have a better idea of what you want it to remember and even a small hint can have big impact.

Just saying "write up what you know", with no other clues, should not perform any better than generic compaction.


What makes it Neural?

Neural Amp Models are small neural nets that have been trained to emulate the sound of classic guitar amps and effects pedals. The neural networks are trained on amp inputs and amp outputs (or effect inputs and effect outputs). Neural Amp Modeller Core is open source, licensed under an MIT license. TONE300.com provides both free online training of Neural net Models, and a massive collection community-generated high-quality Neural Amp Models. NAM has a large and very active online community.

It is obviously not possible to run huge LLMs in realtime; but it's been known for some time that very small neural network ML models can run in realtime, and can produce absolutely stunning simulations of real amps. The models are typically in the order of thousands of weights, not billions of weights. (Not actually sure what the weight count is for the A2 pico models that are being discussed in the original post. OP may be able to help with that).

These tiny neural network models not only accurately reproduce the sound of the original amps, but also manage to reproduce the feel of playing the original amp as well. The quality is dramatically better than most previous amp simulations (and entirely competitive with really high-end amp simulation technologies like Kemperer). It is breakthrough technology, for guitar pedals and amp simulators particularly, that literally changes everything in the music industry. The models are also relatively easy to train. TONE3000.com provides free online services for training models, and currently host a massive library of thousands of high-quality NAM models that are downloadable, free of charge.

The particularly interesting part of this report is that a single NAM model will be able to run on ridiculously tiny embedded processors. OP claims to have a 2nd-generation Pico NAM model running on a 500Mhz Cortex M7. First-generation Standard NAM models typically require a much more beefy processor: an ARM processor in the Pi 4 or Pi 5 sort of range (2.0Ghz Cortex A72, and 2.4Ghz Cortex A76 processors), or on x64 processors (an N100-class intel processor would be a good choice).

(Author of an open source project that uses Neural Amp Modeler Core technology to run NAM models on Raspberry Pis)


Couldnt this be done through classical methods using deconvolution?

Convolution reverbs use convolution, and can do so because reverb effects are (to a reasonable approximation) linear.

To model amplifiers and effects, you would have to do non-linear deconvolution and re-convolution, which is pretty compute intensive, and pretty challenging to do in realtime. There are a couple of well-known algorithms for doing this; but I'm not aware of anyone that can do it in realtime.


The actual effect transforming the audio is a neural network. The beauty is that you can record someone playing guitar through an ultra-rare amp/filter/whatnot (both clean signal and output) and then train the neural network to replicate that.

How much of ghostty is written by Claude?

Mitchell Hashimoto doesn’t need LLM’s, LLM’s need Mitchell Hashimoto

This was a great interview but Mitchell: https://youtu.be/WjckELpzLOU

He covers his LLM uses too! Highly recommend, and Mitchell thoughts on open source inspired me to start contributing to projects outside of my common experience.


Ewww, hero worship.

I was more getting at the angle that when people say things like “Wow, I asked AI to code a terminal emulator and it got it mostly right!”, it’s not because the LLM is amazingly smart only by inference, it’s been trained on the appropriated code of individuals like the above.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: