You won’t be throttled if you partition your data “correctly” and have a reasonable amount of data in your bucket. That’s difficult to do, but more than possible.
We already partition our data correctly, as the AWS solution architects recommend,
but there are limitations on the:
- total throughput (100 Gbit/sec);
- the number of requests.
For example, at this moment, I'm doing an experiment: creating 10, 100, and 500 Clickhouse servers, and reading the data from s3, either from a MergeTree table or from a set of files.
Ten instances saturate 100 Gbit/sec bandwidth, and there is no subsequent improvement.
JFYI, 100 Gbit/sec is less than one PCI-e with a few M.2 SSDs can give.
Ahh yes, sorry, you are running it on a single instance. That’s capped, but the aggregate throughput across instances can be a lot higher.
There are request limits but these are per partition. There is also an undocumented hard cap on list requests per second which is much lower than get object.
Worth to note that ClickHouse is a distributed MPP DBMS, it can scale up to 1000s of bare-metal servers and process terabytes (not terabits) of data per second.
It also works on a laptop, or even without installation.
Something must be wrong, at my previous work we were able to achieve a much higher aggregate throughput - are you spreading them across AZ, and are you using a VPC endpoint?
We did use multiple VPCs, that might make a difference