Actually not surprised. I guess this is for the same reason “say it twice” [1] i...

		skyde 17 days ago \| parent \| context \| favorite \| on: LLM Neuroanatomy II: Modern LLM Hacking and Hints ... Actually not surprised. I guess this is for the same reason “say it twice” [1] is working. Because LLm are trained as causal language model, past token cannot attend to future token. One copy of the layer set solve this. [1]https://arxiv.org/html/2512.14982v1