Inspect AI Evals for the Reversal Curse

I've been thinking a lot about what it'll take to dramatically improve LLM performance across the board. My running hypothesis? We need to crack three core concepts from relational frame theory (RFT) - and the "reversal curse" is one of them. The reversal curse is where large language models see plenty of instances of "A is B" but fail to generalize and learn "B is A". It's described by Berglund et al.