> code for radiation hardened environments I’m aware of code that detects bit fl...

gmueckl · 2026-03-06T07:30:43 1772782243

For safety critical systems, one strategy is to store at least two copies of important data and compare them regularly. If they don't match, you either try to recover somehow or go into a safe state, depending on the context.

d1sxeyes · 2026-03-06T07:44:25 1772783065

At least three copies, so you can recover based on consensus.

Dylan16807 · 2026-03-06T07:52:53 1772783573

If your pieces of important data are very tiny, that's probably your best option.

If they're hundreds of bytes or more, then two copies plus two hashes will do a better job.

d1sxeyes · 2026-03-06T12:24:12 1772799852

Ah, true! You just restore the one that matches its hash. Elegant.

rixed · 2026-03-06T16:39:56 1772815196

A single hash should be enough.

Dylan16807 · 2026-03-06T19:55:07 1772826907

Yes, but what's easier depends on layout. "Consensus" makes me think of multiple entire nodes, and in that situation you can have a nice symmetry by making each node store one copy and one small hash.

If you're doing something that's more centralized then one hash might be simpler, but if you're centralized then you should probably use your own error correction codes instead of having multiple copies.

qznc · 2026-03-06T16:48:17 1772815697

In many cases the system is perfectly safe when it shuts off. Two is enough for that.

pizza · 2026-03-06T11:20:51 1772796051

“never go to sea with two chronometers, take one or three”

DennisP · 2026-03-06T16:56:17 1772816177

Seems like chronometers would be a case where two are better than one, because the mistakes are analog. If they don't exactly agree, just take the average. You'll have more error than if you were lucky enough to take the better chronometer, but less than if you had taken only the worse one. Minimizing the worst case is probably the best way to stay off the rocks.

robocat · 2026-03-08T22:05:08 1773007508

And for breaking failures, two is way better than one! Having zero working chronometers would be bad.

DennisP · 2026-03-09T01:34:04 1773020044

And come to think of it, if the two chronometers are wrong in different directions, then the average could be more accurate than either of them.

Helmut10001 · 2026-03-06T09:35:01 1772789701

I use ZFS even on consumer devices, these days. Parity checks all the way!

vntok · 2026-03-06T07:22:32 1772781752

You can have voting systems in place, where at least 2 out of 3 different code paths have to produce the same output for it to be accepted. This can be done with multiple systems (by multiple teams/vendors) or more simply with multiple tries of the same path, provided you fully reload the input in between.

qznc · 2026-03-06T07:24:40 1772781880

The simplest one is a watchdog: If something stops with regular notifications, then restart stuff.

gmueckl · 2026-03-06T07:32:55 1772782375

A watchdog guards against unresponsive software. It doesn't protect against bad data directly. Not all bad data makes a system freeze.