"By default, hitch has an overhead of ~200KB per connection"
Ouch. This default means 16GB only lets you handle 80k connections. I would hardly call this "scalable". In 2015 you find blog posts left and right showing you how to reach 1 million connections on a single machine with language X or tool Y or framework Z. Maybe the developers should change this default.
"Preallocating memory is usually an opimization for throughput."
Still, 200kB is excessive. A program with buffers no larger than 10kB can _easily_ saturate a 1 Gbit/s NIC. Hitch is designed to handle many concurrent connections, so even if it handled a paltry 10 connections it could easily saturate a 10 Gbit/s NIC with 10kB buffers. If not, then there is a design flaw somewhere.
"Servers can have up to 1TB of ram without becoming overpriced"
This is irrelevant. If a Hitch version had a default overhead of 10kB per connection, it could in theory scale to 20x the number of connections than this version of Hitch, for a given amount of RAM (no matter the amount). Maximizing the use you get out of a given amount of hardware resources should be your priority when writing scalable software.
How do you think the CPU usage would be with 10kB buffer sizes? And since we're throwing numbers out in the air, why stop at 10kB? If we reduce to 1k, that should give us MUCH MOR connections!!11.
Let me ask a leading question: how much of this do you think is openssl overhead?
Please consider optimising for a real usage scenario, not some fantasy benchmarking setup.
I am not picking my numbers randomly. On a x86/x86-64 Linux kernel, one socket (one connection) will use at least one 4kB physical memory page. So if userland also allocates one or two 4kB pages for its own needs, you need at minimum 8 to 12kB per connection. That's why I quoted ~10kB.
The minimum theoretical memory usage is 4kB per connection: 1 page in kernel space, and nothing on the userland (eg. you use zero-copy to txfer data between sockets or to/from file descriptors).
If you want to handle 1M connections, you can tune this. It will probably be the easiest thing to tune of many. Note that 1M connections terminated stud/hitch is actually 3M sockets: 1M inbound to stud, 1M initiated by stud and 1M terminated by your underlying server. That's a lot of connections on localhost (on the plus side, 127.0.0.1 is a /8)
If you're splitting 10gbit across 80.000 users, that leaves 125 kbps per user. Split it across 1 million users, and it leaves 10kbps per user. Sure, you could have more than 10gbps bandwidth from a single server to the Internet in theory -- but at that point I don't think sticking to 16GB ram makes much difference.
Millions of connections usually are websocket connections or HTTP keep-alive connections. In those cases there's no much traffic over those connections. Imagine game server for example. Latency is more important than bandwidth. 10 kbps is enough for many tasks.
I wonder how much of a hit typical web-socket use-cases would take from swapping to SSD? For games I'd think one might prefer just using connectionless udp, though?
Ouch. This default means 16GB only lets you handle 80k connections. I would hardly call this "scalable". In 2015 you find blog posts left and right showing you how to reach 1 million connections on a single machine with language X or tool Y or framework Z. Maybe the developers should change this default.
https://news.ycombinator.com/item?id=3028741