This article, along with others covering the topic, seem to foster an air of mystery about machine learning which I find quite offputting.
Known as generalization, this is one of the most fundamental ideas in machine learning—and its greatest puzzle. Models learn to do a task—spot faces, translate sentences, avoid pedestrians—by training with a specific set of examples. Yet they can generalize, learning to do that task with examples they have not seen before.
Sounds a lot like Category Theory to me which is all about abstracting rules as far as possible to form associations between concepts. This would explain other phenomena discussed in the article.
Like, why can they learn language? I think this is very mysterious.
Potentially because language structures can be encoded as categories. Any possible concept including the whole of mathematics can be encoded as relationships between objects in Category Theory. For more info see this excellent video.
He thinks there could be a hidden mathematical pattern in language that large language models somehow come to exploit: “Pure speculation but why not?”
Sound familiar?
models could seemingly fail to learn a task and then all of a sudden just get it, as if a lightbulb had switched on.
Maybe there is a threshold probability of a positied association being correct and after enough iterations, the model flipped it to “true”.
I’d prefer articles to discuss the underlying workings, even if speculative like the above, rather than perpetuating the “It’s magic, no one knows.” narrative. Too many people (especially here on Lemmy it has to be said) pick that up and run with it rather than thinking critically about the topic and formulating their own hypotheses.
Yep my sentiment entirely.
I had actually written a couple more paragraphs using weather models as an analogy akin to your quartz crystal example but deleted them to shorten my wall of text…
We have built up models which can predict what might happen to particular weather patterns over the next few days to a fair degree of accuracy. However, to get a 100% conclusive model we’d have to have information about every molecule in the atmosphere, which is just not practical when we have a good enough models to have an idea what is going on.
The same is true for any system of sufficient complexity.