That's exactly what I found. Why these files should exist at all? Some other IDEs just have a bunch of highlighting rules based on regular expressions and have a folder of tiny XML grammar files instead of a folder of bloaty shared libraries.
Because it's far more reliable to use proper parsers instead of a bunch of regular expressions. Most languages cannot be properly parsed with regexes.
Those files are compiled tree-sitter grammars, read up on why it exists and where it is used instead of me poorly regurgitating official documentation:
That would waste CPU time and introduce additional delays when opening files.
They could probably lazily install the grammars like neovim does, but as someone who doesn't have much faith in the reliability of internet infrastructure, I'll personally take it...
Just ran `:TSInstall all` in neovim out of curiosity, and the results were predictable:
If disk space is important for your use case, I guess filesystem compression would save far more than just compressing binaries with upx. btrfs+zstd handle those .so well:
$ compsize ~/.local/share/nvim/lazy/nvim-treesitter/parser
Type Perc Disk Usage Uncompressed Referenced
TOTAL 11% 26M 231M 231M
$ compsize /usr/lib/helix/runtime/grammars
Type Perc Disk Usage Uncompressed Referenced
TOTAL 12% 23M 184M 184M
For real parsing a proper compiler codebase (via a language server implementation) should be used. Writing something manually can't work properly, especially with languages like C++ and Rust with complex includes/imports and macros. Newer LSP editions support syntax-based highlighting/colorizing, but if some LSP implementation doesn't support it, using regexp-based fallback is mostly fine.