Introducing Incremental Consistency Guarantees

eternalban · on Sept 11, 2016

My read of this is that the authors believe they have a "novel" candidate set of universal semantics and an API to encapsulate what seems to be well known patterns in this space.

I think this approach is viable in conjunction with reusable components and radically swappable backends: ICG would be the non-moving interface that (per the claims) would allow perfect reuse without modification due to changed storage model.

However the typical consumer of products in this space is building high performance and tightly integrated systems. And in terms of contributions to formal definitions, it seems they are introducing neologisms that fairly match one to one to existing terminology and concepts.

> To program with ICG, applications need to wait asynchronously for multiple replies to an operation (where each reply encapsulates a different guarantee on the result) while doing useful work, i.e., speculate. To the best of our knowledge, no abstraction fulfills these criteria. [1]

Operating on (possibly stale) snapshots of an MVCC store? Making changes to a local copy (e.g. pull) of a blockchain (e.g. git)?

> Correctables

Merge conflict?

> Speculate

Optimistic commit?

[1]: referenced preprint at https://arxiv.org/pdf/1609.02434v1.pdf

adizere · on Sept 12, 2016

Indeed, high performance and integrated systems form a big chunk of the intended "consumer space" of Correctables. But being integrated or high perf. does not mean these systems do not use language-level constructs; what I have in mind, as example of constructs, are mainly Promises, which is the root abstraction under Correctables.

Regarding MVCC: speculation there is done, by default, by the concurrency control system, so applications remain unaware that their state is, at any point, speculative. In other words, the application does not do any useful work while waiting for something, but it's the concurrency control system which cleverly overlaps (and sometimes undoes) transactions. Also, the applications always observes a single consistency model (isolation) at play.

Agree with you that working on a git branch is the same as doing useful work on a preliminary view in the ICG model. Git (and POW-based blockchains likewise) are quite some peculiar applications though, aren't they? How many applications -- I'm thinking here mainly about web services -- do we see thriving in this disconnected model of work, where forks are inherently prone to appear? I can imagine many wouldn't be happy to see their e-mail inbox constantly forking and showing them various versions, occasionally dropping e-mails, etc. Or you shopping cart acting like it had multiple personalities. :)

I would also add, as food for thought, that there is a technical bit of difference between how ICG is meant to operate and how request-replies usually operate, which might be relevant for some applications: in ICG, ideally, there should be a single request sent to the service, and that request would resolve in multiple steps, until eventually reaching the correct (strongly-consistent) view. Not going to insist on this point though, since it's a more complicated matter which is not too critical.

Also, I'm not sure what you mean exactly by candidate set of universal semantics; I'd be glad to continue our discussion on this path, however.

adrian // one of the co-authors

eternalban · on Sept 13, 2016

[Sorry for the late replay, Adrian, in case you read this.]

> Candidate set of universal semantics

more precisely the semantics and api for a uniform treatment of the full state[-transition] space of an Eventually Consistent Available Partition Tolerant (ECAP) distributed system.

(I would think you will agree that ICG is fully bounded by ECAP and does not concern CP systems. The dual to ICG would be an incremental availability guarantees. I'll grant that these two modalities (ICG/IAG) seem to have a wave/particle type of relationship.)

The concern with performance hit is mainly that, would the conceptual framework prevent categories of optimization.

adizere · on Sept 13, 2016

Hey eternalban, thanks, very interesting comments! It's plain to see that you have a deeper / more theoretical perspective on distsys than I tend to have.

Would you describe universal semantics as being "Turing-complete in the context of a distributed system"?

One way to look at it is that Correctables are essentially just wrappers over traditional RPCs. So in this sense they would enable universal semantics -- you can treat the whole space of transitions in a distributed system.

Now it's not clear to me why you need the make the distinction between CP and AP. ICG is orthogonal to these categories. Consider the following example. Say you're using MySQL (a CP system) and you also have a client-side cache (essentially AP). A Correctable can supply results from either or both of these systems: if it's either, then it will manifest as an AP or as an CP system (depending where the result arrives from -- you can choose); if it's both, then you'll essentially get an AP system, since it's possible that your result from MySQL is forsaken due to a partition.

Does this make sense?

> re: IAG /\ ICG

Incremental availability guarantees I'd say it's a bit of a misnomer, since availability is not such a shaded guarantee as consistency is. I regard availability as a black & white property, whereas consistency is a whole spectrum of colours -- even more, which can be represented as a directed acyclic graph. Even so, the concept of IAG would have some practical consequences.

I'll now run the risk of taking it maybe a bit too far: but I think whenever an application chooses a consistency model, that model inherently comes along with its own preset availability guarantee. As consistency models get stronger, there is an instant switch from available to non-available models. [I should cite some refs here but I'm in a hurry.] Consequently, if you have incremental consistency guarantees then you also have decreasing availability guarantees.

Also, your analogy to wave-particle duality is very interesting, though it's a shame that it's above my head to ponder over it. I'll read more about it to see if I can connect the dots.

eternalban · on Sept 14, 2016

> Would you describe universal semantics as being "Turing-complete in the context of a distributed system"?

Food for thought! But note Universal in context of EC-AP. That said, turing completeness also gives us the halting problem, which in context of a distributed systems seems to map to FLP impossibility and transaction finalization. (Just musing here, please note). If we're to insist on a robust notion of "gurantees" here, then prudence seems to suggest a weaker computation model is more appropriate.

> availability is not such a shaded guarantee as consistency is. I regard availability as a black & white property, whereas consistency is a whole spectrum of colours.

I definitely see the validity of that point of view but then question the insistence that (capability) availabilities should be seen as all or nothing. For example, what to make of a partially available systems? If you are old enough you may remember "all circuits are busy" when making international calls, but your calls to the local area network were still available.

Of course (re. colors), when we start looking at availability as measure, it is not a continuum, but rather a discreet space where progressively richer interaction semantics become available.

> Consider the following example. Say you're using MySQL (a CP system) and you also have a client-side cache (essentially AP).

We're in agreement here. First let's note it is 'turles all the way down', since e.g. your persistence subsystem at the OS/HW level (that your AP or CP system depends on) is at this point a miniature distributed system so this dance of availability and consistency (such as whether the OS or the SSD both agree that a certain block has such and such value) continues until we hit deterministic devices. To be concrete, consider fsync's consistency guarantees in the practical context of modern hardware and OS and virtualization.

End-to-end principle can inform us here. The interaction semantics are affected at precise delinations in the system. It should not matter that behind the facade there is MySql or just plain jane mmap'ed files. You, as the programmer, are modeling an interaction via a specific set of interfaces. So if at the system surface level you are interacting the characteristics are AP, it is irrelevant if the inner workings of the system are CP.

That said ..

> Now it's not clear to me why you need the make the distinction between CP and AP. ICG is orthogonal to these categories.

Possibliy because of the C in ICG :) A CP system can only be inconsistent if it is faulted. I am having difficulty in understanding how/why a non-faulty CP system would have incremental consistency states.

> I'll now run the risk of taking it maybe a bit too far: but I think whenever an application chooses a consistency model, that model inherently comes along with its own preset availability guarantee. As consistency models get stronger, there is an instant switch from available to non-available models. [I should cite some refs here but I'm in a hurry.] Consequently, if you have incremental consistency guarantees then you also have decreasing availability guarantees.

I think we're in agreement here.