If it is just a backend, why not port it over to one of the myriad of cloud auto...

lddemi · on Aug 15, 2021

Author here. We do and did use autoscaling heavily but at a certain scale we just ran out of headroom on the smaller instance types we were using. Jumping to a much larger instance types meant that we will likely never run into those headroom issues again, plus solves other problems like faster spin up, better sidecar connection pooling and allows for a much higher hit rate on per instance caching.

TekMol · on Aug 15, 2021

Did you consider switching form CPython to Pypy?

latchkey · on Aug 15, 2021

You were autoscaling a single threaded process. You had 1000 connections coming in and scaling 1000 workers for those connections. Everything was filtered through gunicorn and nginx, which just adds additional latencies and complexity, for no real benefit.

What I'm talking about is just pointing at something like AppEngine, Cloud Functions, etc... (or whatever solution AWS has that is similar) and being done with it. I'm talking about not running your own infrastructure, at all. Let AWS and Google be your devops so that you can focus on building features.

aeyes · on Aug 16, 2021

According to the article they have a monolithic Django application so this will have at least a couple of seconds start-up time. That is not a good match for Cloud Functions.

Django also has in-memory caches, for example for templates which can be extremely slow (seconds) and CPU intensive to render. So you really don't want to have AWS or Google restart your application on AppEngine whenever they feel like it.

scrollaway · on Aug 16, 2021

There's a few reasons why this scenario wouldn't be a good fit for cloud functions, but that "couple of seconds start-up time" can be almost entirely removed from the equation by keeping the Django instance alive (all cloud function type offerings will have a concept of cold and warm starts, and some way to control persistence across calls on the same "instance").

I've run Django on AWS Lambda in a a scenario that scaled between 25-250 calls per second depending on time of day (for a runtime of 5-30 sec). Moving Django's bootstrapping so it would stay warm across calls was very easy.

stuaxo · on Aug 16, 2021

Silly question, but under nginx or apache do django instances persist or are they recreated for every new request?

aeyes · on Aug 16, 2021

The standard gunicorn configuration (and the one shown in the blog post) never restarts worker processes.

gunicorn has an option --max-requests to restart every X requests but unless you have unfixable memory leaks there is no reason to do this.

Nginx can't directly run WSGI applications, you can do it with Nginx Unit which also never restarts processes.

zbentley · on Aug 17, 2021

> unless you have unfixable memory leaks there is no reason to do this.

It's also useful to set this threshold to prevent long-lived connections to services/datastores not used by every request from accumulating and consuming resources on those services.

ddorian43 · on Aug 15, 2021

Now you just 5x their costs.

latchkey · on Aug 15, 2021

Not if you do it right.

a) you get to fire the devops person, which saves $150k+ a year.

b) you add appropriate caching layers in front of everything.

c) you spend time adding features, which generate revenue.

I've done all of this before at scale. This whole case study was written about work I did [1]. Two devs, 3 months to release, first year was $80m gross revenue on $500/month cloud bills. Infinite scalability, zero devops.

[1] https://cloud.google.com/customers/gearlaunch

Nextgrid · on Aug 15, 2021

> you get to fire the devops person, which saves $150k+ a year.

You are deluded or extremely short-sighted if you believe you can actually fire the devops guy. From my experience, the more you stray away from the conventional "dedicated server" paradigm the more you need a devops guy and you are in a very precarious position if you do fire him and something goes wrong.

latchkey · on Aug 16, 2021

You don't hire the devops person until you've scaled to the point that you need one.

Additionally, your thought of having my company held hostage by a single devops person is terrifying. Now you need two of them, which is even more expensive.

It is a great way to bootstrap a company by saving on a salary (or two) that can honestly be engineered out for a lot of SASS businesses. It worked super well for us... and calling someone who did $80m in the first year deluded seems well, rude.

But, if you start off designing systems that scale on their own, you are much better prepared for when you do get some fast growth than dealing with hiring a good devops person (which is extremely hard, as they say.. all the good ones are taken).

At the end of the day, the actual elephant in the room is that django was the wrong choice. You end up having to go through a lot of contortions to make things work, as evidenced by the blog post. The architecture doesn't make things easy to spin up quickly... which creates a lot of bottlenecks. There are better cloud-based solutions.

quickthrower2 · on Aug 16, 2021

If you don’t have a devops person, then you end up with developers pitching in to fill that void. That’s OK and may be desirable but it is still a cost.

_tom_ · on Aug 15, 2021

They are on a back-end that does auto-scaling. They stated that they had problems when scaling up past 1000 nodes.

Now, maybe they could have fixed that issue instead, but going from 29 to 58 workers is easy, it's not the same going to 29,000 to 58,000. And 1000 hosts vs 500 is a non-trivial cost.

PaywallBuster · on Aug 15, 2021

containers would've solved it

one process per container, easy peasy

motoboi · on Aug 15, 2021

you now containers are just processes, right?

This is what they did, but because they didn't need to schedule other jobs on the same machine, kubernetes or even docker would be overkill.

In this case, simple VM orchestration seems like a fine solution.

PaywallBuster · on Aug 16, 2021

Indeed,

but you wouldn't be thinking about instance sizes,

how many processes per instance and

wondering if you're hitting kernel limits with all the issues coming up

zbentley · on Aug 17, 2021

You'd probably be worrying more about instance sizes if you ran a single executor per container; the memory overhead of your app would become a problem very quickly unless it's startup footprint was quite small.

PaywallBuster · on Aug 18, 2021

That's what they're doing now.

One app pool with one worker x number of cores

zbentley · on Aug 19, 2021

I assumed they're managing all those workers under one parent process which compiled their codebase on start. Perhaps that assumption was in error.

motoboi · on Aug 16, 2021

Why not?

Spivak · on Aug 15, 2021

This doesn’t work so easily with architectures with process pools for workers. So now your app server needs to speak docker (or whatever control plane) to spawn new workers and deal with more complicated IPC. Also the startup time is brutal.

One process per container and multiprocessing is a huge lift most of the time. I’ve done it but it can be a mess because you don’t really have as much a handle on containers than subprocesses because you can only poke them at a distance through the control plane.

zbentley · on Aug 17, 2021

> One process per container and multiprocessing

Do you mean multiprocessing inside the containers? Or are you managing multiprocessing child procs by forking into a container somehow? If the latter, I'd be really interested to learn how to do that; I didn't think it was possible, and it would be super useful for some of what I work on.

PaywallBuster · on Aug 18, 2021

That's what they're doing now.

One app pool with one worker x number of cores

Wrapping it around a container makes no difference