True, I should probably have been more specific, with 'LLMs' I was just referrin...

DougBTX · on April 18, 2023

The difficulty with PARITY for Transformers seems to be that it relies heavily on the position encoding. The longer the input sequence, the more likely it is to read the wrong bit, then all the subsequent output is wrong.

Instruction tuned LLMs as chat bots receive multiple inputs in a sequence, interspersed with computation to produce output tokens. Those output tokens appear to be able to act like short term memory.

For example, this ChatGPT 3.5 prompt appears to work well:

    You will act as a parity calculator. Ask me for each bit in turn. The parity value starts at 0, an is updated with xor using each new bit. Print out the updated value after each new bit is processed.

Responding with 0 or 1 produces outputs like:

    Got it, so we'll apply the XOR operation between the current parity value (1) and the new bit (1), which gives us 0.

    The updated parity value is 0.

    What is the next bit?

After ~20 iterations the output looks stable, if a little wordy. Perhaps this works as the tokens to attend to are all recent. I imagine that there would need to be a mechanism to remind the model what it is doing if the chat continued indefinitely, otherwise the initial prompt may be forgotten, I’ve not done any experiments with that yet.