The index only should apply over non-null values. I'm not sure how anything else...

masklinn · on April 14, 2023

That is equivalent to the DISTINCT semantics.

First, it should be noted that the reason this toggle was added is because the spec was ambiguous and different databases have different defaults, specifically:

- BigQuery and Db2 forbid nullable columns in indexes, solving the issue

- most databases default to NULLS DISTINCT

- SQL Server defaults to NULLS NOT DISTINCT

- as usual Oracle huffes glue in the corner (it generally uses NULLS NOT DISTINCT, except in the case where every column in the constraint is null in which cases it uses NULLS DISTINCT —- I assume because it just doesn’t apply the constraint at all)

And of course something else makes sense, because SQL’s NULL covers multiple use cases (UNKNOWN, missing, intentionally left out) they leave a lot to the details of the modelling.

For instance let’s say you’re modelling an FS-like structure, you have records which have a parent and a name, and the name needs to be unique within a parent.

That constraint needs to apply the the top-level items still, and while you could model top-level-ness through a special-case hard-coded record, you could just as well model it by leaving the parent field NULL, in which case you’d need `UNIQUE (parent, name)` where “parent” is NULLS NOT DISTINCT. This is especially common under under organic growth.

Adding a separate boolean flag is redundant and distinctly ugly, I’m not sure expressions are supported by every db (and they’re not free) and while sometimes a sentinel value can be used, that’s not the case in general.