Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I mostly agree but I think there are too many weird UB and corner cases to qualify as a small language. I guess C compiled with -fwrapv -fno-strict-overflow -fno-strict-aliasing and a few others might come close to what you're describing.

I think C's apparent simplicity lulls me into complacency at times, I delude myself into thinking that I'm coding into some kind of macro assembly and I think I know what the resulting machine code will look like. And then some super weird optimization or UB kicks in and nothing makes sense anymore, because I stopped playing by the rules and I triggered the footgun.

Just look at the number of bug reports on the GCC bugtracker for code that at a glance ought to work and it turns out that it's actually not a bug, the code just triggered a subtle UB and the compiler ran away with it and generated code that ate your cat.



> I mostly agree but I think there are too many weird UB and corner cases to qualify as a small language.

I don't see the point of your claim regarding undefined behaviour. The rules are quite simple: undefined behavior means compiler-specific behavior. Therefore, if you aim for compiler independence then you don't use it. If somehou you decide yo target a compiler then you read the compiler's docs. It's that simple.

These UB complains are even more ridiculous when we realize they complain about the fact that the language is actually defined.


UB is not implementation defined, it's UB. Some compilers have options to defuse certain classes of UBs but then you're effectively coding in a non-compatible dialect of C. Otherwise you can't ever rely on a certain UB behaving one way or an other: a simple compiler update, code change or compiler flag modification could break everything. A compiler is under no obligation to define what it does in case of UB and that's the point of it, it leaves some room for aggressive optimization.

There are "implementation defined" details in the C standard but it's a different problem, see for instance: https://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html

Anyway that wasn't really my point, the problem is that some of these UB can arise because of subtle bugs in code that might not look suspicious at a glance. Things like breaking aliasing rules, mis-using unions, casting things that aren't compatible etc... Your code triggers an UB and you don't know it. Actually you might not notice it until you turn an optimization flag or you update your compiler and suddenly it doesn't do what you want anymore.

Even something as trivial as computing a pointer that's more than one byte after the end of an object is UB for instance (not dereferencing it, merely computing its address). For that reason `ptr.offset` in unsafe in Rust for instance, even though it doesn't dereference the pointer.


I find it a bit silly that `ptr.offset` is unsafe, but casting an arbitrary integer to a pointer isn't. E.g.:

    fn main() {
        // Look, an invalid pointer, no `unsafe` required.
        let ptr = 1000 as *const u8;
        // boom, segfault.
        println!("{}", unsafe { *ptr });
    }
Using casting one can even implement a "safe" pointer offset function, like so:

    fn main() {
        fn safe_offset<T>(ptr: *const T, offset: isize) -> *const T {
            ((ptr as usize).wrapping_add(offset as usize)) as *const T
        }
        let xs = [0u8, 10];
        let ptr = safe_offset(&xs[0], 1);
        println!("{}", unsafe { *ptr }); // prints '10'
    }
Obviously this "safe_offset" function can easily be used to trigger UB by computing invalid pointers, and not a single line of unsafe code was required (although we do need `unsafe` to dereference the bad pointer and actually trigger segfaults).


I believe this is because offset uses an llvm intrinsic with the extra requirements, as it uses that info for optimization. Your version doesn't.


Interesting. However, isn't casting a random (potentially invalid) integer into pointer triggering the same potential UB? I ask because for GCC it is apparently:

>When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8.

https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-imple...

That's pretty explicitly what Thiez was doing in their rust code, although obviously Rust/LLVM might have different semantics here.


The docs list no undefined behaviour for inttoptr, so it's (probably) not problematic at the LLVM level: http://llvm.org/docs/LangRef.html#inttoptr-to-instruction .


The exact rules of unsafe code are still up in the air. It’s not explicitly defined as UB yet, IIRC, and when we set the rules, we have a goal of not invalidating large swaths of code.


> if you aim for compiler independence then you don't use [undefined behavior].

I think the point was that UB occurs in a lot of relatively common cases and programmers don't realize that they're depending on it/experiencing it.

> If [somehow] you decide [to] target a compiler then you read the compiler's docs.

Which is why the post you're responding to made the point about undefined behavior. "Read the compiler's docs" is in counterpoint to that post's parent, which praises C for being a small language, and thus one in which reading the language's docs is seldom required.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: