What is the advantage of persistent collections over value-type collections?

karavelov · on April 6, 2018

Structural sharing. It means you don't have to copy the whole data structure but just the path to the root.

millstone · on April 6, 2018

I don't know - that sounds more like an implementation detail available to both. Obviously if you modify the first element of a value-type list, it's not going to copy the rest.

Persistent data structures are really about the interface: functions that would be traditionally mutating instead return new versions. Is there an advantage of that interface over mutable value-type collections? One advantage is that it works better with e.g. folds. But overall I think it's more awkward.

mookid11 · on April 7, 2018

The point is to share intermediate state of your data structure without headache.

For instance, if you write an interpreter, representing variable bindings with a functional map allows to reason about scope super easily; when you change scope, you just create a new map with the new variable bindings, and when the scope ends you just get back the map at the scope beginning which is still valid.

With a stateful hashtable, you need to undo the changes you did in you scope; the value of the table when opening the scope that you want to get back to is gone forever. Now it ties you program to a certain execution order. One way to see that is that if you want to collect environment to each scope in some data structure for some reason, you need to make deep copies everywhere.

masklinn · on April 7, 2018

> I don't know - that sounds more like an implementation detail available to both.

It impacts the performance profile of the collection, and thus isn't merely an implementation detail.

elihu · on April 7, 2018

It seems like it would be nice to just use the immutable interface and have it be mutable in the case that there's only one owner, but be able to fall back to immutability when you need it.

It could make refactoring and debugging easier. For instance, let's say you're part way through some operation and it fails, and the error recovery logic is buggy and leaves the data structure in an inconsistent state. You could debug the problem, but if you're in a hurry and you just need something that works right now, you can store a reference to the data structure at the beginning of the operation and revert to that if anything goes wrong. You'd pay a performance cost since all the direct in-place edits would turn into copies and allocations, but it would at least work.

Similarly, you could develop a program initially in an all-immutable multi-owner style, and then selectively convert codepaths to single-owner style to get better performance in the parts where performance matters.

kbsletten · on April 7, 2018

This, of course requires tracking of owners. If you're on the JVM, that's not so cheap. If you're already tracking pointers or can do compiler-aided RAII like python, or c++ then it's relatively simple unless you also need recursive data structures which "own" themselves.

nine_k · on April 7, 2018

Lack of defensive copying (widespread e.g. in Java) may alone be worth it, performance-wise.

Also, shared data are much easier to reason about if it's immutable (not necessarily even in a concurrent execution context).