They are both; there are things that Rust's macros can do metaprogramming-wise that C++ templates cannot do and vice-versa.
Rust's macros work on a syntactic level, so they are more powerful in that they can work with "normally" invalid code and perform token-to-token transformations (and in the case of proc macros effectively function as compiler extensions/plugins) and less powerful in that they don't have access to semantic information.
> For example you cannot design something that comes evwn close to expression templates libraries.
You keep saying this and it's still wrong. Rust is quite capable of expression templates, as its iterator adapters prove. What it isn't capable of (yet) is specialization, which is an orthogonal feature.
Rust cannot take a const function and evaluate that into the argument of a const generic or a proc macro. As far as I can tell, the reasons are deeply fundamental to the architecture of rustc. It's difficult to express HOW FUNDAMENTAL this is to strongly typed zero overhead abstractions, and we see where Rust is lacking here in cases like `Option` and bitset implementations.
> Rust is quite capable of expression templates, as its iterator adapters prove.
AFAIU iterator adapters are not quite what expression templates are because they rely on the compiler optimizations rather than the built-in feature of the language, which enable you to do this without relying on the compiler pipeline.
I had always thought expression templates at the very least needed the optimizer to inline/flatten the tree of function calls that are built up. For instance, for something like x + y * z I'd expect an expression template type like sum<vector, product<vector, vector>> where sum would effectively have:
That would require the optimizer to inline the latter into the former to end up with a single expression, though. Is there a different way to express this that doesn't rely on the optimizer for inlining?
Expression templates do not rely on optimizer since you're not dealing with the computations directly but rather expressions (nodes) through which you are deferring the computation part until the very last moment (when you have a fully built an expression of expressions, basically almost an AST). This guarantees that you get zero cost when you really need it. What you're describing is something keen of copy elision and function folding though inlining which is pretty much basics in any c++ compiler and happens automatically without special care.
> since you're not dealing with the computations directly but rather expressions (nodes) through which you are deferring the computation part until the very last moment (when you have a fully built an expression of expressions, basically almost an AST).
Right, I understand that. What is not exactly clear to me is how you get from the tree of deferred expressions to the "flat" optimized expression without involving the optimizer.
Take something like the above example for instance - w = x + y * z for vectors w/x/y/z. How do you get from that to effectively
for (size_t i = 0; i < w.size(); ++i) {
w[i] = x[i] + y[i] * z[i];
}
The example is false because that's not how you would write an expression template for given computation so the question being how is it that the optimizer is not involved is also not quite set in the correct context so I can't give you an answer for that. Of course that the optimizer is generally going to be involved, as it is for all the code and not the expression templates, but expression templates do not require the optimizer in the way you're trying to suggest. Expression templates do not rely on O1, O2 or O3 levels being set - they work the same way in O0 too and that may be the hint you were looking for.
> The example is false because that's not how you would write an expression template for given computation
OK, so how would you write an expression template for the given computation, then?
> Expression templates do not rely on O1, O2 or O3 levels being set - they work the same way in O0 too and that may be the hint you were looking for.
This claim confuses me given how expression templates seem to work in practice?
For example, consider Todd Veldhuizen's 1994 paper introducing expression templates [0]. If you take the examples linked at the top of the page and plug them into Godbolt (with slight modifications to isolate the actual work of interest) you can see that with -O0 you get calls to overloaded operators instead of the nice flattened/unrolled/optimized operations you get with -O1.
You see something similar with Eigen [2] - you get function calls to "raw" expression template internals with -O0, and you need to enable the optimizer to get unrolled/flattened/etc. operations.
Similar thing yet again with Blaze [3].
At least to me, it looks like expression templates produce quite different outputs when the optimizer is enabled vs. disabled, and the -O0 outputs very much don't resemble the manually-unrolled/flattened-like output one might expect (and arguably gets with optimizations enabled). Did all of these get expression templates wrong as well?
Look, I have just completed work on some high performance serialization library which avoids computing heavy expressions and temporary allocations all by using expression templates and no, optimization levels are not needed. The code works as advertised at O0 - that's the whole deal around it. If you have a genuine question you should ask one but please do not disguise so that it only goes to prove your point. I am not that naive. All I can say is that your understanding of expression templates is not complete and therefore you draw incorrect conclusions. Silly example you provided shows that you don't understand how expression template code looks like and yet you're trying to prove your point all over and over again. Also, most of the time I am writing my comments on my mobile so I understand that my responses sometime appear too blunt but in any case I will obviously not going to write, run or check the code as if I had been on my work. My comments here is not work, and I am not here to win arguments, but most of the time learn from other people's experiences, and sometimes dispute conclusions based on those experiences too. If you don't believe me, or you believe expression templates work differently, then so be it.
> If you have a genuine question you should ask one but please do not disguise so that it only goes to prove your point.
I think my question is pretty simple: "How does an optimizer-independent expression template implementation work?" Evidently the resources I've found so far describe "optimizer-dependent expression templates", and apparently none of the "expression template" implementations I've had reason to look at disabused me of that notion.
> My comments here is not work, and I am not here to win arguments, but most of the time learn from other people's experiences, and sometimes dispute conclusions based on those experiences too.
Sure, and I like to learn as well from the more knowledgeable/experienced folk here, but as much as I want to do so here I'm finding it difficult since there's precious little for me to go off of beyond basically just being told I'm wrong.
> If you don't believe me, or you believe expression templates work differently, then so be it.
I want to understand how you understand expression templates, but between the above and not being able to find useful examples of your description of expression templates I'm at a bit of a loss.
Expression templates do AST manipulation of expressions at compile time. Let's say you have a complex matrix expression that naively maps to multiple BLAS operations but can be reduced to a single BLAS call. With expression templates you can translate one to the other, this is a static manipulation that does not depend on compiler level. What does depend on the compiler is whether the incidental trivial function calls to operators gets optimized away or not. But, especially with large matrices, the BLAS call will dominate anyway, so the optimization level shouldn't matter.
Of course in many cases the optimization level does matter: if you are optimizing small vector operators to simd inlining will still be important.
> With expression templates you can translate one to the other, this is a static manipulation that does not depend on compiler level.
How does that work on an implementation level? First thing that comes to mind is specialization, but I wouldn't be surprised if it were something else.
> What does depend on the compiler is whether the incidental trivial function calls to operators gets optimized away or not.
> Of course in many cases the optimization level does matter: if you are optimizing small vector operators to simd inlining will still be important.
Perhaps this is the source of my confusion; my uses of expression templates so far have generally been "simpler" ones which rely on the optimizer to unravel things. I haven't been exposed much to the kind of matrix/BLAS-related scenarios you describe.
Partial specialization specifically. Match some patterns and covert it to something else. For example:
struct F { double x; };
enum Op { Add, Mul };
auto eval(F x) { return x.x; }
template<class L, class R, Op op> struct Expr;
template<class L, class R> struct Expr<L,R,Add>{ L l; R r;
friend auto eval(Expr self) { return eval(self.l) + eval(self.r); } };
template<class L, class R> struct Expr<L,R,Mul>{ L l; R r;
friend auto eval(Expr self) { return eval(self.l) * eval(self.r); } };
template<class L, class R, class R2> struct Expr<Expr<L, R, Mul>, R2, Add>{ Expr<L,R, Mul> l; R2 r;
friend auto eval(Expr self) { return fma(eval(self.l.l), eval(self.l.r), eval(self.r));}};
template<class L, class R>
auto operator +(L l, R r) { return Expr<L, R, Add>{l, r}; }
template<class L, class R>
auto operator *(L l, R r) { return Expr<L, R, Mul>{l, r}; }
double optimized(F x, F y, F z) { return eval(x * y + z); }
double non_optimized(F x, F y, F z) { return eval(x + y * z); }
Optimized always generates a call to fma, non-optimized does not. Use -O1 to see the difference (will inline trivial functions, but will not do other optimizations). -O0 also generates the fma, but it is lost in the noise.
The magic happens by specifically matching the pattern Expr<Expr<L, R, Mul>, R2, Add>; try to add a rule to optimize x+y*z as well.
Hrm, OK, that makes sense. Thanks for taking the time to explain! Guessing optimizing x+y*z would entail something similar to the third eval() definition but with Expr<L, Expr<L2, R2, Mul>, Add> instead.
I think at this point I can see how my initial assertion was wrong - specialization isn't fully orthogonal to expression templates, as the former is needed for some of the latter's use cases.
Does make me wonder how far one could get with rustc's internal specialization attributes...
> C++26 adds destructive moves. They are called relocatable types.
I thought those were removed? For example, see Herb's 2025-11/Kona trip report [0]:
> For trivial relocatability, we found a showstopper bug that the group decided could not be fixed in time for C++26, so the strong consensus was to remove this feature from C++26.
> there is 30 years of C++ in the real world, initializing everything by default unless you opt-in will break some performance critical code that should not initialize everything
...But the change to EB in this case does initialize everything by default?
No it doesn't. It says the value is unspecified but it exists. Sometimes some compilers did initialize everything (this was common in debug builds) before. Some of them will in the future, but most won't do anything difference.
The only difference is some optimizer used to eliminate code paths where they could prove that path would read an uninitialized variable - causing a lot of weird bugs in the real world.
The precise value is not specified, but whatever value is picked also has to be something that isn't tied to the state of the program so some kind of initialization needs to take place.
Furthermore, the proposal explicitly states that (some) variables are initialized by default:
> Default-initialization of an automatic-storage object initializes the object with a fixed value defined by the implementation
> The automatic storage for an automatic variable is always fully initialized, which has potential performance implications.
> The automatic storage for an automatic variable is always fully initialized, which has potential performance implications.
If that's what they're going for, it's way too much weight to hang on a single vague word like that. Trying to define "state of the program" in a detailed way sounds nightmarish. Let's say I'm the implementation. If I go get fresh (but not zeroed) memory from the OS to put my stack on, the garbage in there isn't state of the program, right? If I then run a function and the function exits, is the garbage now state of the program, or is it outside the state of the program? If I want a fixed init value per address, is that allowed as a hardening feature or disallowed as being based on allocation patterns? Does the as-if rule apply, so I'm fine if the program can't know for sure where I got my arbitrary byte values from?
And would that mean there's still no way to say "Don't waste time initializing it, but don't do any UB shenanigans either. (Basically, pretend it was initialized by a random number generator.)"
> Let's say I'm the implementation. If I go get fresh (but not zeroed) memory from the OS to put my stack on, the garbage in there isn't state of the program, right?
I'd argue that once you get the memory it's now part of the state of your program, which precludes it from being involved in whatever value you end up reading from the variable(s) corresponding to that memory.
> If I want a fixed init value per address, is that allowed as a hardening feature or disallowed as being based on allocation patterns?
I'd guess that that specific implementation would be disallowed, but as I'm an internet nobody I'd take that with an appropriately-sized grain of salt.
> And would that mean there's still no way to say "Don't waste time initializing it, but don't do any UB shenanigans either. (Basically, pretend it was initialized by a random number generator.)"
I feel like you'd need something like LLVM's `freeze` intrinsic for that kind of functionality.
It means what it says on the tin. Whatever value ends up being used must not depend on the state of the program.
> Are we talking about all zeros?
All zeros is an option, but the intent is to allow the implementation to pick other values as it sees fit:
> Note that we do not want to mandate that the specific value actually be zero (like P2723R1 does), since we consider it valuable to allow implementations to use different “poison” values in different build modes. Different choices are conceivable here. A fixed value is more predictable, but also prevents useful debugging hints, and poses a greater risk of being deliberately relied upon by programmers.
> Is the implementation not permitted to use whatever arbitrary value was in memory?
No, because the value in such a case can depend on the state of the program.
> Why not?
Doing so would defeat the purpose of the change, which is to turn nasal-demons-on-mistake into something with less dire consequences:
> In other words, it is still an "wrong" to read an uninitialized value, but if you do read it and the implementation does not otherwise stop you, you get some specific value. In general, implementations must exhibit the defined behaviour, at least up until a diagnostic is issued (if ever). There is no risk of running into the consequences associated with undefined behaviour (e.g. executing instructions not reflected in the source code, time-travel optimisations) when executing erroneous behaviour.
> What’s up with [[indeterminate]]?
The idea is to provide a way to opt into the old full-UB behavior if you can't afford the cost of the new behavior.
> I would expect “indeterminate” to mean that the variable has a value that happens to be arbitrary (and may contain sensitive data, etc), not that it turns back into actual UB.
I believe the spelling matches how the term was used in previous standards. For example, from the C++23 standard [0] (italics in original):
> When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced.
> Doing so would defeat the purpose of the change, which is to turn nasal-demons-on-mistake into something with less dire consequences
What nasal demons?
UB is permitted to format your disk, execute arbitrary code, etc. But there’s lots of room between deterministic values and UB. For example, taking a value that does depend on the previous state of the program and calling it the “erroneous” value would give a non-UB, won’t format your hard disk solution. And it even makes quite a lot of performance sense: the value that was already in the register or at that address in memory is available for free! The difference from C++23 would be that using that value would merely be erroneous and not UB.
And I think the word “indeterminate” should have been reserved for that sort of behavior.
Those that result from the pre-C++26 behavior where use of an indeterminate value is UB.
> But there’s lots of room between deterministic values and UB.
That's a fair point. I do think I made a mistake in how I represented the authors' decision, as it seems the authors intentionally wanted the predictability of fixed values (italics added):
> Reading an uninitialized value is never intended and a definitive sign that the code is not written correctly and needs to be fixed. At the same time, we do give this code well-defined behaviour, and if the situation has not been diagnosed, we want the program to be stable and predictable. This is what we call erroneous behaviour.
> And I think the word “indeterminate” should have been reserved for that sort of behavior.
Perhaps, but that'd be a departure from how the word has been/is used in the standard so there would probably be some resistance against redefining it.
No, in general Rust doesn't (and can't) know whether an arbitrary function has side effects. The compiler does arguably have a leg up since Rust code is typically all built from source, but there's still things like FFI that act as visibility barriers for the compiler.
Right so strictly speaking C++ could do anything here when passed a null pointer, because even though assert terminates the program, the C++ compiler cannot see that, and there is then undefined behaviour in that case
> because even though assert terminates the program, the C++ compiler cannot see that
I think it should be able to. I'm pretty sure assert is defined to call abort when triggered and abort is tagged with [[noreturn]], so the compiler knows control flow isn't coming back.
Shouldn't control flow diverge if the assert is triggered when NDEBUG is not defined? Pretty sure assert is defined to call abort when triggered and that is tagged [[noreturn]].
Rust's macros work on a syntactic level, so they are more powerful in that they can work with "normally" invalid code and perform token-to-token transformations (and in the case of proc macros effectively function as compiler extensions/plugins) and less powerful in that they don't have access to semantic information.
reply