Hacker Newsnew | past | comments | ask | show | jobs | submit | superjan's commentslogin

Is this something you can share in more detail? Did you document a skill for the LLM to use? And with what tasks do you see most improvement?

The point of this is to reduce a complex tool surface to a single sql query tool without losing the richness of the underlying representation.

In practice this allows for me to combine multiple, complex data sources with a constant number of tools. I can add a whole new database and not add a new tool. My prompts are effectively empty aside from metadata around the handful of tools it has access to.

This only seems to perform well with powerful models right now. I've only seen it work with GPT5.x. But, when it does work it works at least as well as a human given access to the exact same tools. The bootstrapping behavior is extremely compelling. The way the LLM probes system tables, etc.

The tasks this provides the most uplift for are the hardest ones. Being able to make targeted queries over tables like references and symbols dramatically reduces the number of tokens we need to handle throughout. Fewer tokens means fewer opportunities for error.


But what if it gets hacked by the russians?

Considering the usual behavior of the Russian military, it’ll keep firing at Russian soldiers.

Patrick Boyle (fund manager, professor, youtuber) recently discussed this IPO on his channel. Informative and entertaining.

https://youtu.be/8rS3fTbC7TE?is=TGpEdM2Y7sknP-cW


I used to watch him a lot, but he started talking about AI (I work at a big lab) and it was all wrong, so I'm not sure if I can trust his analysis anymore :(

Unfortunately it's kind of impossible for a YouTube to make weekly/bi-weekly videos that are actually in-depth to an expert level. The best thing you can do is interview experts, but even then, everyone has their own biases.

*YouTuber

He packages things to present them as analytical, but it's really just click bait for people to hear something they want to hear. He did a take over a year ago on why the EV revolution crashed with such gems as presenting less growth (but still growth) as lower sales. The comment section was full of never EV crowd who got their fix that everything will be alright and that nothing will change. Of course a year later there were booming sales worldwide.

The sad reality I'm coming to realize is that there is very little real and quality analysis, critical but with open eyes on the future. Most of it is just pandering to crowds. The war in Iran is the latest example - you have one side saying Iran is almost done, and the other that they're winning. Who's right? Doesn't matter, being correct is not the point.


Yea. It's hard to tell what's true anymore. I thought Russia would be out of resources in 3 months. It's been 4 years. I thought Rafah would survive. It's completely flattened. Thought global markets would crash after tariffs. It has survived.

I'm convinced we're in some kind of propaganda machine right now.


Propaganda aside (which exists), the world is just an extremely complex place and the people writing these things are taking guesses a lot of the time. That’s it.

[flagged]


What's your issue with the Rafah comment?

Rafah is probably not 100% gone, but it is basically gone. Majority of the people are gone and it's mostly a pile of rubble.

https://en.wikipedia.org/wiki/Rafah#/media/File:An_aerial_vi...

https://www.nytimes.com/interactive/2025/05/15/world/middlee...


[flagged]


wow. usually don't expect that the people i'm writing with are proudly and openly pro-genocide, my bad. we're talking about over a million people, you know.

You're projecting. Byfåne.

[flagged]


You'll be thoroughly disappointed by your own comment history.

I stopped watching him because I don't understand why a competent finance expert is slinging ads for earbuds and quick meals. Feels like he's just making "Youtube content" rather than anything serious.

What was he wrong about?

Gell man amnesia


And despite popular opinion, he is not an AI :)

Also, famous rapper

I found his take on the space data center a bit negative. No idea if it is feasible right now, but you could have made the same jokes and ridicule about the feasibility of electric cars (batteries too weak!) before Elon build Tesla. And Patrick just lists some reasons why it is currently hard to do. I'm by no means a Elon fan, but if anyone could pull it of it's him, and attempting these hard challenges is a good thing.

A space data center is a technical impossibility. And your hero is an idiot, as you can see here when he explains cooling, at the end of the video: https://youtu.be/trgn7s5-YHc

>> before Elon build Tesla.

He bought the company, they already had electric cars.


He bought the company when it was basically nothing lol

About the use of different units: next time you choose a property name in a config file, include the unit in the name. So not “timeout” but “timeoutMinutes”.

Yes!! This goes for any time you declare a time interval variable. The number of times I've seen code changes with a comment like "Turns out the delay arg to function foo is in milliseconds, not seconds".

Or require the value to specify a unit.

At that point, you're making all your configuration fields strings and adding another parsing step after the json/toml/yaml parser is done with it. That's not ideal either; either you write a bunch of parsing code (not terribly difficult but not something I wanna do when I can just not), or you use some time library to parse a duration string, in which case the programming language and time library you happen to use suddenly becomes part of your config file specification and you have to exactly re-implement your old time handling library's duration parser if you ever want to switch to a new one or re-implement the tool in another language.

I don't think there are great solutions here. Arguably, units should be supported by the config file format, but existing config file formats don't do that.


TOML has a datetime type (both with or without tz), as well as plain date and plain time:

  start_at = 2026-05-27T07:32:00Z  # RFC 3339
  start_at = 2026-05-27 07:32:00Z  # readable
We should extend it with durations:

  timeout = PT15S  # RFC 3339
And like for datetimes, we should have a readable variant:

  timeout = 15s   # can omit "P" and "T" if not ambiguous, can use lowercase specifiers
Edit: discussed in detail here: https://github.com/toml-lang/toml/issues/514

great, now attackers can also target all the libraries to enable all that complexity in npm too.

> adding another parsing step after the json/toml/yaml parser is done with it. That's not ideal either

I'd argue that it is ideal, in the sense that it's the sweet spot for a general config file format to limit itself to simple, widely reusable building blocks. Supporting more advanced types can get in the way of this.

Programs need their own validation and/or parsing anyway, since correctness depends on program-specific semantics and usually only a subset of the values of a more simply expressed type is valid. That same logic applies across inputs: config may come from files, CLI args, legacy formats, or databases, often in different shapes. A single normalization and validation path simplifies this.

General formats must also work across many languages with different type systems. More complex types introduce more possible representations and therefore trade-offs. Even if a file parser implements them correctly (and consistently with other such parsers), it must choose an internal form that may not match what a program needs, forcing extra, less standard transformation and adding complexity on both sides for little gain.

Because acceptable values are defined by the program, not the file, a general format cannot fully specify them and shouldn’t try. Its role is to be a medium and provide simple, human-usable (for textual formats), widely supported types, avoid forcing unnecessary choices, and get out of the way.

All in all, I think it can be more appropriate for a program to pick a parsing library for a more complex type, than to add one consistently to all parsers of a given file format.


Another parsing step is the common case. Few parameters represent untyped strings where all characters and values are valid. For numbers as well, you often have a limited admissible range that you have to validate for. In the present case, you wouldn’t allow negative numbers, and maybe wouldn’t allow fractional numbers. Checking for a valid number isn’t inherently different from checking for a regex match. A number plus unit suffix is a straightforward regex.

timeoutMs is shorter ;)

You guys can't appreciate a bad joke


Megaseconds are about the right timescale anyway

What megaseconds? They clearly meant the Microsoft-defined timeout.

Well megaseconds has the nice property that it's about about equal to a Scaramucci so it can be used across domains.

timoutμs is even better. People will learn how to type great symbols.

They wouldn't have to, if the file format accepted floats in proper exponential format.

Yes timout indeed!

not timeout at all is even shorter.

That was a very good summary. One detail the post could use is mentioning that 4 or 10 experts invoked where selected from the 512 experts the model has per layer (to give an idea of the savings).


From the “silicon valley astronomy lectures”, an excellent overview of current techniques and results for finding and examining exoplanets. By Dr. Bruce Macintosh.


Watched this a few days ago. The video is light on technical details, except maybe that they used CGI to generate training data.


The idea behind a greenscreen is that you can make that green colour transparent in the frames of footage allowing you to blend that with some other background or other layered footage. This has issues like not always having a uniform colour, difficulty with things like hair, and lighting affecting some edges. These have to be manually cleaned up frame-by-frame, which takes a lot of time that is mostly busy work.

An alternative approach (such as that used by the sodium lighting on Mary Poppins) is that you create two images per frame -- the core image and a mask. The mask is a black and white image where the white pixels are the pixels to keep and the black pixels the ones to discard. Shades of gray indicate blended pixels.

For the mask approach you are filming a perfect alpha channel to apply to the footage that doesn't have the issues of greenscreen. The problem is that this requires specialist, licensed equipment and perfect filming conditions.

The new approach is to take advantage of image/video models to train a model that can produce the alpha channel mask for a given frame (and thus an entire recording) when just given greenscreen footage.

The use of CGI in the training data allows the input image and mask to be perfect without having to spend hundreds of hours creating that data. It's also easier to modify and create variations to test different cases such as reflective or soft edges.

Thus, you have the greenscreen input footage, the expected processed output and alpha channel mask. You can then apply traditional neural net training techniques on the data using the expected image/alpha channel as the target. For example, you can compute the difference on each of the alpha channel output neurons from the expected result, then apply backpropagation to compute the differences through the neural network, and then nudge the neuron weights in the computed gradient direction. Repeat that process across a distribution of the test images over multiple passes until the network no longer changes significantly between passes.


On Apple silicon, Parallels can’t run x64 windows, it is using the ARM version of Windows. The x64 emulation is provided by Windows. Of course this is inefficient, but not everything is automatically 2x slower: any OS code you invoke is not running as x64 emulation, and IO and memory access is not penalized by the emulation (but certainly somewhat from virtualization). I was pleasantly surprised how fast you can run x64 windows apps.


Yeah I wasn't aware that Microsoft allowed that nowadays. Still, it's not ideal anyway, because in my experience Windows apps that are compatible with ARM are 90% either FOSS or portable on other platforms anyway. You use Windows to use x86 apps; if you don't need x86 apps you are generally better not using Windows at all, and if you need them they'll probably run poorly on ARM due to multiple layers of emulation. Wine is still an option, though. They support Rosetta on Mac and FEX/Box64 on Linux, so they may lead to better performance than Parallels

> I was pleasantly surprised how fast you can run x64 windows apps

In general as long as you have a fast enough machine emulation isn't that bad. Apple was doing that already for 68k with PPC and most people didn't noticed due to how massively faster their first PPC computers were. Still, the issue is that here we're not really talking about a high-end CPU aren't we


IEEE 754 prescripes, for better or worse, that any mathematical comparison operator (==, <, > ….) involving at least one NaN must always return false, including comparison against itself. This is annoying for something like dictionaries or hashtables. C# has a solution: if you call a.Equals(b) on two floats a and b, it will return true also if both are NaN. I think this is a cool solution: it keeps the meaning of math operators the same identical with other languages, but you still have sensible behavior for containers. I believe this behavior is copied from Java.


I consider this as a very bad solution, because it can lead to very subtle bugs.

The correct solution for any programming language is to define all the 14 relational operators that are required by any partially-ordered set, instead of defining only the 6 of them that are sufficient for a totally-ordered set.

If the programming language fails to define all 14 operators, then you must always test the operands for NaNs, before using any of the 6 ALGOL relational operators. If you consider this tedious, then you must unmask the invalid operation exception and take care to handle this exception.

If invalid operations generate exceptions, then the floating-point numbers become a totally-ordered set and NaN cannot exist (if a NaN comes from an external source, it will also generate an exception, while internally no NaN will ever be generated).


Yeah, along those lines we have requirements on never logging PII, and not logging anything that potentially contains PII, such as folder names.


Maybe tokenise the PII part of the folder name when outputting it?

ie `$HOME`/.config/foo/stuff.cfg` rather than `/home/joebloggs/foo/stuff.cfg`?


Or have an encrypted data portion, so that the sensitive details can be revealed as-needed, and redaction occurs by rotating a key.

Obviously that depends on the messages being infrequent in production logging levels.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: