Generative crash

MHLoppy@fedia.io · 3 months ago

Generative crash

RunawayFixer@lemmy.world · 3 months ago

A large language model shouldn’t even attempt to do math imo. They made an expensive hammer that is semi good at one thing (parroting humans) and now they’re treating every query like it’s a nail.

Why isn’t OpenAi working more modular whereby the LLM will call up specialized algorithms once it has identified the nature of the question? Or is it already modular and they just suck at anything that cannot be calibrated purely with brute force computing?

Jtotheb@lemmy.world · 3 months ago

Yep. Instead of focusing on humans communicating more effectively with computers, which are good at answering questions that have correct, knowable answers, we’ve invented a type of computer that can be wrong because maybe people will like the vibes more? (And we can sell vibes)

kadu@scribe.disroot.org · 3 months ago

Why isn’t OpenAi working more modular whereby the LLM will call up specialized algorithms once it has identified the nature of the question?

Precisely because this is a LLM. It doesn’t know the difference between writing out a maths problem, a recipe for cake or a haiku. It transforms everything into the same domain and is doing fancy statistics to come up with a reply. It wouldn’t know that it needs to invoke the “Calculator” feature unless you hard code that in, which is what ChatGPT and Gemini do, but it’s also easy to break.

zorblitz@mander.xyz · 3 months ago

Can’t it be trained to do that?

zalgotext@sh.itjust.works · 3 months ago

Sort of. There’s a relatively new type of LLM called “tool aware” LLMs, which you can instruct to use tools like a calculator, or some other external program. As far as I know though, the LLM has to be told to go out and use that external thing, it can’t make that decision itself.

kadu@scribe.disroot.org · 3 months ago

Can the model itself be trained to recognize mathematical input and invoke an external app, parse the result and feed that back into the reply? No.

Can you create a multi-layered system that uses some trickery to achieve this effect most of the time? Yes, that’s what OpenAI and Google are already doing by recognizing certain features of the users’ inputs and changing the system prompts to force the model to output Python code or Markdown notation that your browser then renders using a different tool.

Blackmist@feddit.uk · 3 months ago

In fairness the example I’ve seen MS give was taking a bunch of reviews and determining if the review was positive or negative from the text.

It was never meant to mangle numbers, but we all know it’s going to be used for that anyway, because people still want to believe in a future where robots help them, rather than just take their jobs and advertise to them.

RunawayFixer@lemmy.world · 3 months ago

I would rather not have it attempt something that it can’t do, no direct result is better than a wrong result imo. Here it’s correctly identifying that it’s a calculation question and instead of suggesting using a formula, it tries to hallucinate a numerical answer itself. The creators of the model seem to have a mindset that the model must try to answer no matter what, instead of training it to not answer questions that it can’t answer correctly.

Blackmist@feddit.uk · 3 months ago

As far as I can tell, the copilot command has to be given a range of data to work with, so here it’s pulling a number out of thin air. Be nice if the output from this was just “please tell this command which data to use” but as always it doesn’t know how to say “I don’t know”…

Mostly because it never knew anything to start with.

qaz@lemmy.world · 3 months ago

OpenAI already makes it write Python functions to do the calculations.

Echo Dot@feddit.uk · 3 months ago

So it’s going to write python functions to calculate the answer where all the variables are stored in an Excel spreadsheet a program that can already do the calculations? And how many forests did we burn down for that wonderful piece of MacGyvered software I wonder.

The AI bubble cannot burst soon enough.

UnderpantsWeevil@lemmy.world · 3 months ago

The AI bubble cannot burst soon enough.

Too Big To Fail. Sorry. 1+2+3=15 now.

absentbird@lemmy.world · edit-2 3 months ago

“1”+(2+3) is “15” in JavaScript.

UnderpantsWeevil@lemmy.world · 3 months ago

I’m wondering if this is a sleight-of-hand trick by the poster, then.

If they typed the “1” field to Text and left the 2 and 3 as numeric, then ran Copilot on that. In that case, its more an indictment of Excel than Copilot, strictly speaking. The screen doesn’t make clear which cells are numbers and which are text.

absentbird@lemmy.world · 3 months ago

I don’t think there’s an explanation that doesn’t make this copilot’s fault. Honestly JavaScript shouldn’t allow math between numbers and strings in the first place. “1” + 1 is not a number, and there’s already a type for that: NaN

Regardless, the sum should be 5 if the first cell is text, so it’s incorrect either way.

UnderpantsWeevil@lemmy.world · 3 months ago

Honestly JavaScript shouldn’t allow math between numbers and strings in the first place.

You can explicitly type values in more recent versions of JavaScript to avoid this, if you really don’t want to let concatenation be the default. So, again, this feels like an oversight in integration rather than strict formal logic.

acchariya@lemmy.world · 3 months ago

1+2+3 has always been 15

the Ministry of Truth brought to you by Carl’s Jr

jacksilver@lemmy.world · 3 months ago

This is the right answer.

LLMs have already become this weird mesh of different services tied together to look more impressive. OpenAIs models can’t do math and farm it out to python for accuracy.

Evotech@lemmy.world · edit-2 3 months ago

They have (well, anthropic)

It’s called MCP

Nowadays you just give the AI access to a calculator basically… or whatever other tools it needs. Including other models to help it answer something.

zod000@lemmy.dbzer0.com · 3 months ago

You have to admire the gall of naming your system after the evil AI villain of the original Tron movie.

zarkanian@sh.itjust.works · 3 months ago

The torment nexus strikes again.

RunawayFixer@lemmy.world · 3 months ago

I hadn’t heard of that protocol before, thanks. It holds a lot of promise for the future, but getting the input right for the tools is probably quite the challenge. And it also scares me because I now expect more companies to release irresponsible integrations.

UnderpantsWeevil@lemmy.world · 3 months ago

Why isn’t OpenAi working more modular whereby the LLM will call up specialized algorithms once it has identified the nature of the question?

Because we vibes-coded the OpenAI model with OpenAI and it didn’t think this was the optimal way to design itself.

Paragone@lemmy.world · 2 months ago

There are math-specific LLM’s, & coding-specific ones

( Yi Coder is one, which I’ve used to translate bits of code into some language I can sorta understand… Julia… I’ve been trying to learn programming for decades, & brain-injury can go eat rocks. : ), too.

LM Studio has a search-function, so search for “math” in its models-search, & see what it comes up with.

I’ve used such things to give me a derivative of some horrible equation NASA published decades ago, & then go finding an online derivatives-finder to check it with…

The thing that kills me is that IT SHOULD BE CHECKED, dammit!

ie: IF the LLM did some bullshit “arithmetic” on a column-of-numbers, THEN the regular code of the spreadsheet should

display the function that the AI used, if any, &
suggest the SUM() function, AND SHOW THAT-FUNCTION’S RESULT.

This whole “LLM: take the wheel” idiocy … incomprehensible.

DuckDuckGo’s AI is hit-or-miss, & sometimes it is stubbornly wrong: no correction gets through to it.

_ /\ _

RunawayFixer@lemmy.world · 2 months ago

One of the other replies said that: “1”+(2+3) is “15” in JavaScript.". So my last theory as to what was going on, was that the creator of the meme had as cell contents =“1”, 2 and 3. And then copilot used python code to sum those, not sum() which would have answered 5.

But since the answer is a black box, who really knows. This blind trust that open ai+ms expect, makes it unusable for anything that needs to be correct and verifiable. Indeed incomprehensible that they think this is a good idea. I’ll have to try finding something better on lm studio the next time that I have a math problem, thanks for that tip.

WalrusDragonOnABike [they/them]@reddthat.com · 3 months ago

Why isn’t OpenAi working more modular whereby the LLM will call up specialized algorithms once it has identified the nature of the question?

Wasn’t doing this part of the reason OpenSeek was able to compete with much smaller data sets and less hardware requirements?

RunawayFixer@lemmy.world · 3 months ago

I assume you mean DeepSeek? And it doesn’t look like it, according to what I could find, their biggest innovation was “reinforcement learning to teach a base language model how to reason without any human supervision”. https://huggingface.co/blog/open-r1

Some others have replied that chatgpt and copilot are already modular: they use python for arithmetic questions. But that apparently isn’t enough to be useful.

absentbird@lemmy.world · 3 months ago

I feel like the best modular AI right now is from Gemini. It can take a scan of a document and turn it into a CSV, which I was surprised by.

I figure it must have multiple steps, OCR, text interpretation, recognizing a table, then piping the text to some sort of CSV tool.

utopiah@lemmy.world · edit-2 3 months ago

the LLM will call up specialized algorithms once it has identified the nature of the question?

Because there is no “it” that “calls” or “identify” even less the “nature of the question”.

This requires intelligence, not probability on the most likely next token.

You can do maths with words and you can write poem with numbers, either requires to actually understand, that’s the linchpin, rather than parse and provide an answer based on what has been written so far.

Sure the model might string together tokens that sound very “appropriate” to the question, in the sense that it fits within the right vocabulary, and if its dataset the occurrence was just frequent enough it even be correct, but that’s still not understanding even 1 single word (or token) within either the question or the corpus.

FishFace@lemmy.world · 3 months ago

An llm can, by understanding some features of the input, predict a categorisation of the input and feed it to do different processors. This already works. It doesn’t require anything beyond the capabilities llms actually have, and isn’t perfect. It’s a very good question why this hasn’t happened here; an llm can very reliably give you the answer to “how do you sum some data in python” so it only needs to be able to do that in excel and put the result into the cell.

There are still plenty of pitfalls. This should not be one of them, so that’s interesting.