We're not talking about unexplained bugs here. We're talking about a pointer that obviously has one bit flipped and it would be correct if you flipped that one bit back.
Caches and registers are also subject to bitflips. In many CPUs the caches use ECC so it's less of a problem. Intel did a study showing that many bits in registers are unused so flipping them doesn't cause problems.
A common case is a pointer that points to unallocated address space triggers a segfault and when you look at the pointer you can see that it's valid except for one bit.
Except no one is claiming the bit flip is the pointer vs the data being pointed to or a non pointer value. Given how we write software there’s a lot more bits not in pointer values that still end up “contributing “ to a pointer value. Eg some offset field that’s added to a pointer has a bit flip, the resulting pointer also has a bit flip. But the offset field could have accidentally had a mask applied or a bit set accidentally due to the closeness of & and && or | and ||.
I think you could get much of the way there by isolating a single NIC's receive queues, so the kernel doesn't decide to run off and service softirqs for random foreign tasks just because your task called tcp_sendmsg.
reply