> Shards increase the number of failure modes and increase the complexity of tho...

bastawhiz · on Jan 26, 2021

> In both cases, each company's non-sharded databases were FAR more operationally problematic than the sharded ones. The sharded database tiers behave in common ways with relatively uniform workloads, and the non-sharded databases were each special snowflakes using different obscure features of the database.

That's because sharded tables restrict what features you can use (e.g., no JOINs). If you constrained the features on the non-sharded databases, you'd achieve the same net result.

Sharding _necessarily_ only solves one problem: queries operate against only a subset of data. While what you're saying is true (sharding avoids certain problems) it also restricts your ability to perform other operations (more complicated queries or reports are ~impossible). It is not without its tradeoffs.

evanelias · on Jan 27, 2021

> no JOINs

Sharded environments still have some JOINs. Typically all data for a single user/customer/whatever should be placed on a single shard, and it's still very useful to join across tables for a single user.

> If you constrained the features on the non-sharded databases, you'd achieve the same net result.

No, that's not the main reason for operational benefits. Rather, it's simply because the shards all have a uniform schema, uniform query workload, and relatively small data size (as compared to a large monolithic DB). You can perform operational maintenance at better granularity -- i.e. on a single shard rather than the entire logical data set. And if a complex operation is fine on one shard, it's very likely fine on the rest, due to the uniformity.

For example, performing a DBMS major version upgrade on a large monolithic DB is a nightmare. It's an all-or-nothing affair. If you encounter some unforeseen problem with the new DB version only in prod, you can expect some significant application-wide downtime. Meanwhile for a sharded environment it's both easier and safer from an operational perspective, assuming your team is comfortable automating the process once it has been proven safe. You can upgrade one shard and confirm no problems after a week in prod before proceeding to the rest of the fleet. If unforeseen problems do occur, worst-case it only impacts a tiny portion of users.

> it also restricts your ability to perform other operations (more complicated queries or reports are ~impossible). It is not without its tradeoffs.

Yes, this is why I said above "There are definitely major downsides to sharding, but they tend to be more on the application side in my experience." OP claimed the downsides were operational (e.g. more complex failures or larger downtime), which I do not agree with.

throwdbaaway · on Jan 26, 2021

One of those companies is facebook right? I think this blog post provides a much more balanced view: http://yoshinorimatsunobu.blogspot.com/2017/11/towards-bigge...

evanelias · on Jan 27, 2021

Yes. Yoshinori is among the best of the best, and I could not hope to capture that level of nuance in a quick HN comment :)

But in terms of being a more balanced view, my read of his post aligns pretty closely with my main point in this subthread: the disadvantages of sharding fall more on the application side (limitations on joins, secondary indexes, transactions) than on the operational side (availability, performance and resource management, logical backup, db cloning, replication).