Man am I getting tired of these articles and we can do without this neurotic melancholic whining. Maybe it is the title of the article that triggered me, but it reminded me of hearing Douglas Murray read excerpts from "The Strange Death of Europe" in his self-aggrandising pompous tone.
The authors colleague needed a couple of tries to write a kernel extension and somehow this means something about programming. If it was not for LLMs I would not have gone back to low-level programming, this stuff is actually getting fun again. Lets check the assembly the compiler produced for the code the LLM produced.
To be clear, I am also having the most fun I've had when it comes to side-projects and even more exploratory things at work. I don't derive all my joy from "Good Code" -- that's silly! I would much rather ship tangible products and features and/or tackle things at home that I wouldn't otherwise.
On the other hand, the other responsibilities of being an engineer have become quite a bit less appealing.
No, it is a hypothesis I formulated here after reading the article. I did a quick check on google scholar but I didn't hit any result. The more interesting question is, if true, what can you do with this information. Maybe it can be a way to evaluate a complete program or specific heap allocator, as in "how fast does this program reach universality". Maybe this is something very obvious and has been done before, dunno, heap algos are not my area of expertise.
Today I thought a lot about this topic and was also trying to find connections to computation. Seems like "computational entropy" could be a useful bridge in the sense that to derive a low entropy output from a high entropy input, it seems intuitively necessary that you'd need to make use of the information in the high entropy input. In this case you would need to compute the eigenvalues, which requires a certain wrestling with the information in the matrices. So even though the entries of the matrices themselves are random, the process of observing their eigenvalues/eigenvectors is has a certain computational complexity involved with processing and "aggregating" that information in a sense.
I realize what I'm saying is very gestural. The analogous context I'm imagining is deriving blue noise distributed points from randomly distributed points: intuitively speaking it's necessary to inspect the actual distributions of the points in order to move the points toward the lower entropy distribution of blue noise, which means "consuming" information about where the points actually are.
The "random song" thing is similar: in order to make a shuffle algorithm that doesn't repeat, you need to consume information about the history of the songs that have been played. This requirement for memory allows the shuffle algorithm to produce a lower entropy output than a purely random process would ever be able to produce.
So hearing that a "purely random matrix" can have these nicely distributed eigenvalues threw me off for a bit, until I realized that observing the eigenvalues has some intrinsic computational complexity, and that it requires consuming the information in the matrix.
Again, this is all very hunchy, I hope you see what I'm getting at.
Interesting, I did not know that colors-of-noice was related to this, what you say sounds conceptually very similar to how Maxwell's demon connects thermodynamics to information theory.
This is like the difference between an orange and fruit juice. You can squeeze an orange to extract its juices, but that is not the only thing you can do with it, nor is it the only way to make fruit juice.
I use tree-sitter for developing a custom programming language, you still need an extra step to get from CST to AST, but the overall DevEx is much quicker that hand-rolling the parser.
Every time I get to sing Treesitters praise, I take the opportunity to. I love it so much. I've tried a bunch of parser generators, and the TS approach is so simple and so good that I'll probably never use anything else. The iteration speed lets me get into a zen-like state where I just think about syntax design, and I don't sweat the technical bits.
Whenever I need to write a parser, and I don't need the absolute best performance, I reach for the lua LPeg library. Sometimes I even embed the lua engine just so I can use that and then implement the rest in the original language.
Could you elaborate on what this involves? I'm also looking at using tree-sitter as a parser for a new language, possibly to support multiple syntaxes. I'm thinking of converting its parse trees to a common schema, that's the target language.
I guess I don't quite get the difference between a concrete and abstract syntax tree. Is it just that the former includes information that's irrelevant to the semantics of the language, like whitespace?
TS returns a tree with nodes, you walk the nodes with a visitor pattern.
I've experimented with using tree-sitter queries for this, but for now not found this to be easier.
Every syntax will have its own CST but it can target a general AST if you will. At the end they can both be represented as s-expressions and but you need rules to go from one flavour of syntax tree to the other.
AST is just CST minus range info and simplified/generalised lexical info (in most cases).
In this context you could say that CST -> AST is a normalization process. A CST might contain whitespace and comments, an AST almost certainly won't.
An example: in a CST `1 + 0x1 ` might be represented differently than `1 + 1`, but they could be equivalent in the AST. The same could be true for syntax sugar: `let [x,y] = arr;` and `let x = arr[0]; let y = arr[1];` could be the same after AST normalization.
You can see why having just the AST might not be enough for syntax highlighting.
As a side project I've been working on a simple programming language, where I use tree-sitter for the CST, but first normalize it to an AST before I do semantic analysis such as verifying references.
I've been using it for semantic chunking in RAG pipelines. Naive splitting is pretty rough for code, but tree-sitter lets you grab full functions or classes. It seems to give much better context quality and keeps token costs down since you aren't retrieving broken fragments.
N00b question: Language parsers gives me concrete information, like “com.foo.bar.Baz is defined here”. Does tree sitter do that or does it say “this file has a symbol declaration for Baz” and elsewhere for that file “there is a package statement for ‘com.foo.bar’” and then I have to figure that out?
You have to figure this out for yourself in most cases.
Tree sitter does have a query language based on s-expressions, but it is more for questions like "give me all the nodes that are literals", and then you can, for example, render those with in single draw call. Tree sitter has incremental parsing, and queries can be fixed at a certain byte range.
Ah, I think I found the reason as to why WebAssembly (in a browser or some other sandboxed environment) is not a suitable substrate for near native performance. It is a very ironic reason: you can't implement a JIT compiler that targets WebAssembly in a sandbox running in WebAssembly. Sounds like an incredibly contrived thing to do but once speed is the goal then a copy-and-patch compiler is a valid strategy for implementing a interpreter or a modern graphics pipeline.
This is true. A multi-tier JIT-compiler requires writable execute memory and the ability to flush icache. Loading segments dynamically is nice and covers a lot of the ground, but it won't be a magic solution to dynamic languages like JavaScript. Modern WASM emulators already implement a full compiler, linker and JIT-compiler in one, almost starting to look like v8. I'm not sure if adding in-guest JIT support is going in the right direction.
I was just meaning, that if you think you can't do it in the timeframe, then you are making it too big for yourself. The rules are so loose that you could literally make a programming language that has a single command `run_my_awesome_game()` and fully impliment the logic etc in your language and library of choice. Obviously a trivial/useless example, but take it up a few notches and you could have something interesting. A DSL inside JSON can be very powerful.
Polders have personhood in some jurisdictions.
The government reclaimed the land from the sea,
sold it to multiple people, levies taxes on them and now the dykes need to be maintained.
This is just legal fiction, technology developed and applied cross industry.
The mere concept of water rights implies obligations must lie someplace.
All this talk about reified gods takes away much of how mundane the concept is.
Yes, like renewable energy infrastructure (which China does, and would be highly useful anyway in case generative AI does live up to its promise).
Even if generative AI lives up to its hype, with current US administration there's no way America is going to lead the race for long. There's just not enough energy available, when those in power oppose developing many of the energy projects that make most economical sense.
After coming down from the shock of learning there are people like you I was even more amazed that one of the founding engineers of Pixar, and a giant in computer graphics, also has this condition. He even did a survey that found his artists where more likely to be on the aphantasia spectrum than managers. Dunno, maybe some people are so driven to create what they cannot think or see.
I’ve heard about that! My partner and I have both been learning to draw this year. I’m pretty decent at drawing observationally / from reference, but I haven’t tried much from memory. I imagine she’d be much better at that side of things. I’ve also noticed I’m not great at coming up with initial ideas or visual concepts, but once I have a topic or direction, I can absolutely run with it.
I also think it makes sense why a lot of software engineers (myself included) have aphantasia. Being “rational” is arguably easier when you’re not influenced by the emotional weight of images. Maybe we’re even less predisposed to PTSD, since we can’t visually relive things in the same way. My mind still races at night like anyone else’s, but it’s all non-visual. Just endless inner monologue instead of a reel of images. Couldn't count sheep if I tried!