This instance actually seems more like ‘context rot’, I suspect google is just shoving everything into the context window cuz their engineering team likes to brag about 10m tokens windows, but the reality is that its preeeeettty bad when you throw too much stuff.
I would expect even very small (4b params or less) models would get this question correct
This instance actually seems more like ‘context rot’, I suspect google is just shoving everything into the context window cuz their engineering team likes to brag about 10m tokens windows, but the reality is that its preeeeettty bad when you throw too much stuff.
I would expect even very small (4b params or less) models would get this question correct