I mostly agree but I think there are too many weird UB and corner cases to quali...

geezerjay · on March 14, 2018

> I mostly agree but I think there are too many weird UB and corner cases to qualify as a small language.

I don't see the point of your claim regarding undefined behaviour. The rules are quite simple: undefined behavior means compiler-specific behavior. Therefore, if you aim for compiler independence then you don't use it. If somehou you decide yo target a compiler then you read the compiler's docs. It's that simple.

These UB complains are even more ridiculous when we realize they complain about the fact that the language is actually defined.

simias · on March 14, 2018

UB is not implementation defined, it's UB. Some compilers have options to defuse certain classes of UBs but then you're effectively coding in a non-compatible dialect of C. Otherwise you can't ever rely on a certain UB behaving one way or an other: a simple compiler update, code change or compiler flag modification could break everything. A compiler is under no obligation to define what it does in case of UB and that's the point of it, it leaves some room for aggressive optimization.

There are "implementation defined" details in the C standard but it's a different problem, see for instance: https://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html

Anyway that wasn't really my point, the problem is that some of these UB can arise because of subtle bugs in code that might not look suspicious at a glance. Things like breaking aliasing rules, mis-using unions, casting things that aren't compatible etc... Your code triggers an UB and you don't know it. Actually you might not notice it until you turn an optimization flag or you update your compiler and suddenly it doesn't do what you want anymore.

Even something as trivial as computing a pointer that's more than one byte after the end of an object is UB for instance (not dereferencing it, merely computing its address). For that reason `ptr.offset` in unsafe in Rust for instance, even though it doesn't dereference the pointer.

Thiez · on March 15, 2018

I find it a bit silly that `ptr.offset` is unsafe, but casting an arbitrary integer to a pointer isn't. E.g.:

    fn main() {
        // Look, an invalid pointer, no `unsafe` required.
        let ptr = 1000 as *const u8;
        // boom, segfault.
        println!("{}", unsafe { *ptr });
    }

Using casting one can even implement a "safe" pointer offset function, like so:

    fn main() {
        fn safe_offset<T>(ptr: *const T, offset: isize) -> *const T {
            ((ptr as usize).wrapping_add(offset as usize)) as *const T
        }
        let xs = [0u8, 10];
        let ptr = safe_offset(&xs[0], 1);
        println!("{}", unsafe { *ptr }); // prints '10'
    }

Obviously this "safe_offset" function can easily be used to trigger UB by computing invalid pointers, and not a single line of unsafe code was required (although we do need `unsafe` to dereference the bad pointer and actually trigger segfaults).

steveklabnik · on March 15, 2018

I believe this is because offset uses an llvm intrinsic with the extra requirements, as it uses that info for optimization. Your version doesn't.

simias · on March 15, 2018

Interesting. However, isn't casting a random (potentially invalid) integer into pointer triggering the same potential UB? I ask because for GCC it is apparently:

>When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.

https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-imple...

That's pretty explicitly what Thiez was doing in their rust code, although obviously Rust/LLVM might have different semantics here.

dbaupp · on March 15, 2018

The docs list no undefined behaviour for inttoptr, so it's (probably) not problematic at the LLVM level: http://llvm.org/docs/LangRef.html#inttoptr-to-instruction .

steveklabnik · on March 15, 2018

The exact rules of unsafe code are still up in the air. It’s not explicitly defined as UB yet, IIRC, and when we set the rules, we have a goal of not invalidating large swaths of code.

zbentley · on March 14, 2018

> if you aim for compiler independence then you don't use [undefined behavior].

I think the point was that UB occurs in a lot of relatively common cases and programmers don't realize that they're depending on it/experiencing it.

> If [somehow] you decide [to] target a compiler then you read the compiler's docs.

Which is why the post you're responding to made the point about undefined behavior. "Read the compiler's docs" is in counterpoint to that post's parent, which praises C for being a small language, and thus one in which reading the language's docs is seldom required.