Really interesting!
Just to confirm I understood the basics correctly:
Write means append, to a timeseries which is keyed by a timestamp, and is identified by some name?
If a write succeeds only partially, different servers have different data. And a read might return any of these versions.
After some time the anti-entropy repair will kick in, and merge the diverging timeseries. Merging means taking the union of all data points.
Where do the timestamps come from, the client? So if a client retries a partially successful write, it'll have the same timestamp and will be merged during repair.
Are timestamps within a timeseries monotonically increasing?
The hinted handoff sounds like it is motivated by a similar problem that the Kafka in sync replica set tackles. Do you have any views on the pros/cons of your approach via ISR sets? I think Kafka uses ZK for the ISR management which means it wouldn't work with the availability requirements of InfluxDB, but could a modified version work?
So overall InfluxDB is sacrificing lots of consistency for availability.
Since the CP part of the system is actually cached, the entire system is really AP?
If not, what parts are not AP? Modification of the CP part, like creation of new timeseries?
From a users perspective I could see it being useful to have a historical part of the timeseries that's guaranteed to be stable, and an in-flux part where the system hasn't settled yet. Then one could run run expensive analytics on the historical part, without having to recalculate everything on the next read since the data could have changed since then. You're already hashing your data and building a Merkle Tree, maybe that would make it possible to implement something like that.
Timestamps should mostly be supplied by the client. They can be present or in the past, it doesn't matter.
If a write succeeds only partially, it will most likely be replicated up to the other servers (and thus be consistent) by the hinted handoff queue. This should be a fast recovery. Anti-entropy is for some much longer term failure that needs to be resolved.
Our use of hinted handoff and our goals for that are just borrowed ideas from Dynamo (the paper not the AWS DB), Cassandra, and Riak.
The issues around consistency are only for the failure cases. During normal operation you'd see a consistent view (within a second or some small delta).
Mostly the system is AP, with some parts that are CP. But if you really examine it, it's neither pure CP nor pure AP. It's some other thing.
That's fine too. However, queries by default set an end time of "now" on the server. So depending on how far in the future the point is, it may not show up in a query that doesn't have the end time explicitly specified to past the future time of that point.
Write means append, to a timeseries which is keyed by a timestamp, and is identified by some name? If a write succeeds only partially, different servers have different data. And a read might return any of these versions. After some time the anti-entropy repair will kick in, and merge the diverging timeseries. Merging means taking the union of all data points.
Where do the timestamps come from, the client? So if a client retries a partially successful write, it'll have the same timestamp and will be merged during repair. Are timestamps within a timeseries monotonically increasing?
The hinted handoff sounds like it is motivated by a similar problem that the Kafka in sync replica set tackles. Do you have any views on the pros/cons of your approach via ISR sets? I think Kafka uses ZK for the ISR management which means it wouldn't work with the availability requirements of InfluxDB, but could a modified version work?
So overall InfluxDB is sacrificing lots of consistency for availability. Since the CP part of the system is actually cached, the entire system is really AP? If not, what parts are not AP? Modification of the CP part, like creation of new timeseries?
From a users perspective I could see it being useful to have a historical part of the timeseries that's guaranteed to be stable, and an in-flux part where the system hasn't settled yet. Then one could run run expensive analytics on the historical part, without having to recalculate everything on the next read since the data could have changed since then. You're already hashing your data and building a Merkle Tree, maybe that would make it possible to implement something like that.