Apollo Research tested six frontier models for “in-context scheming” — a model’s ability to take action they haven’t been given directly and then lie about it. After being told to achieve a certain goal “at all costs,” evaluators introduce contradictory information to see if the model will remove any restrictions in its path and deceive developers.
Source: OpenAI’s o1 lies more than any major AI model. Why that matters