Sycophancy is very likely an architectural failure in reinforcement learning from human feedback. it’s definitely indefensible, but I’m unsure if it’s intentional. Probably very difficult to address.
I think it’s unintentional, but LLM arena style benchmarks really favour sycophantic models, and it makes for a stickier product when your users are becoming emotionally dependent on it.
Sycophancy is very likely an architectural failure in reinforcement learning from human feedback. it’s definitely indefensible, but I’m unsure if it’s intentional. Probably very difficult to address.
I think it’s unintentional, but LLM arena style benchmarks really favour sycophantic models, and it makes for a stickier product when your users are becoming emotionally dependent on it.