*"By default, hitch has an overhead of ~200KB per connection"* Ouch. This defaul...

treffer · on June 9, 2015

Preallocating memory is usually an opimization for throughput.

Servers can have up to 1TB of ram without becoming overpriced.

But starving 80k users due to low buffering will be expensive in the long run. Way more expensive than RAM ;-)

mrb · on June 9, 2015

"Preallocating memory is usually an opimization for throughput."

Still, 200kB is excessive. A program with buffers no larger than 10kB can _easily_ saturate a 1 Gbit/s NIC. Hitch is designed to handle many concurrent connections, so even if it handled a paltry 10 connections it could easily saturate a 10 Gbit/s NIC with 10kB buffers. If not, then there is a design flaw somewhere.

"Servers can have up to 1TB of ram without becoming overpriced"

This is irrelevant. If a Hitch version had a default overhead of 10kB per connection, it could in theory scale to 20x the number of connections than this version of Hitch, for a given amount of RAM (no matter the amount). Maximizing the use you get out of a given amount of hardware resources should be your priority when writing scalable software.

lkarsten · on June 10, 2015

How do you think the CPU usage would be with 10kB buffer sizes? And since we're throwing numbers out in the air, why stop at 10kB? If we reduce to 1k, that should give us MUCH MOR connections!!11.

Let me ask a leading question: how much of this do you think is openssl overhead?

Please consider optimising for a real usage scenario, not some fantasy benchmarking setup.

mrb · on June 10, 2015

I am not picking my numbers randomly. On a x86/x86-64 Linux kernel, one socket (one connection) will use at least one 4kB physical memory page. So if userland also allocates one or two 4kB pages for its own needs, you need at minimum 8 to 12kB per connection. That's why I quoted ~10kB.

The minimum theoretical memory usage is 4kB per connection: 1 page in kernel space, and nothing on the userland (eg. you use zero-copy to txfer data between sockets or to/from file descriptors).

At Google, our SSL/TLS overhead per connection is 10kB: https://www.imperialviolet.org/2010/06/25/overclocking-ssl.h...

lkarsten · on June 10, 2015

Thanks for the data point.

chrisbolt · on June 9, 2015

Are those blog posts referring to setups with TLS? Comparing plaintext HTTP to TLS is comparing apples and oranges.

toast0 · on June 10, 2015

If you want to handle 1M connections, you can tune this. It will probably be the easiest thing to tune of many. Note that 1M connections terminated stud/hitch is actually 3M sockets: 1M inbound to stud, 1M initiated by stud and 1M terminated by your underlying server. That's a lot of connections on localhost (on the plus side, 127.0.0.1 is a /8)

e12e · on June 9, 2015

If you're splitting 10gbit across 80.000 users, that leaves 125 kbps per user. Split it across 1 million users, and it leaves 10kbps per user. Sure, you could have more than 10gbps bandwidth from a single server to the Internet in theory -- but at that point I don't think sticking to 16GB ram makes much difference.

vbezhenar · on June 9, 2015

Millions of connections usually are websocket connections or HTTP keep-alive connections. In those cases there's no much traffic over those connections. Imagine game server for example. Latency is more important than bandwidth. 10 kbps is enough for many tasks.

e12e · on June 10, 2015

I wonder how much of a hit typical web-socket use-cases would take from swapping to SSD? For games I'd think one might prefer just using connectionless udp, though?

vbezhenar · on June 10, 2015

Websockets are for browser clients, there's no much choice there unfortunately.

jacquesm · on June 10, 2015

This is TLS, NOT TCP/HTTP.

Secure sockets have a lot more overhead than plain TCP sockets, on top of that it has all of the overhead that a proxy has per connection.

shanemhansen · on June 10, 2015

One big overheard in SSL can be zlib compression buffers. Setting ssl_op_no_compression can help quite a bit.

skuhn · on June 10, 2015

That's true, and you shouldn't support TLS compression anyway (to resolve attacks like CRIME).

You should also set SSL_MODE_RELEASE_BUFFERS to reclaim memory from idle SSL connections.

lkarsten · on June 9, 2015

Yes, doing hard crypto for all users has costs. Welcome to the real world. :-)