Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities 72
Apple's AI research team has uncovered significant weaknesses in the reasoning abilities of large language models, according to a newly published study. MacRumors: The study, published on arXiv [PDF], outlines Apple's evaluation of a range of leading language models, including those from OpenAI, Meta, and other prominent developers, to determine how well these models could handle mathematical reasoning tasks. The findings reveal that even slight changes in the phrasing of questions can cause major discrepancies in model performance that can undermine their reliability in scenarios requiring logical consistency.
Apple draws attention to a persistent problem in language models: their reliance on pattern matching rather than genuine logical reasoning. In several tests, the researchers demonstrated that adding irrelevant information to a question -- details that should not affect the mathematical outcome -- can lead to vastly different answers from the models.
Apple draws attention to a persistent problem in language models: their reliance on pattern matching rather than genuine logical reasoning. In several tests, the researchers demonstrated that adding irrelevant information to a question -- details that should not affect the mathematical outcome -- can lead to vastly different answers from the models.
Uh - duh? (Score:3, Informative)
AI does not reason. It predicts word ordering. Reasoning requires knowledge bases with semantic knowledge and analysis. Word ordering just puts jumbles of symbols in order.
Re: (Score:3)
Turns out, a lot. But, it is still fundamentally limited by the whole starting point of building the best auto-complete.
Re: (Score:3)
"Trained" is itself the wrong terminology. "Training" implies learning, which implies intelligence. LLMs are a giant statistical-probability database with an impressive depth of connection between each individual tokenized node, but nowhere in there does any actual intelligence or reasoning ability exist.
The whole term "artificial intelligence" is the problem. It, and the use of terms like "training," lead people to anthropomorphize what they shouldn't.
Re: (Score:3)
Bit of an old man yelling at clouds here. Programming relies on a lot of metaphors to help us understand the purpose of things.
I do not think semaphores are using little colored flags to control my threads,
which I do not believe to be strings bound on spools to divide my jobs,
which I do not believe to be gainful employment on the part of my code.
And:
Objects are not things I can hold.
Models are not toy planes
Servers don't bring you your food
Links are not part of a chain
Calling functions does not require a p
Re: Uh - duh? (Score:2)
The whole term "artificial intelligence" is the problem
It's a term that never really had any practical meaning other than a program that responds to inputs. In the 80s and 90s, AI was your chess opponent, which basically did fancy heuristics with a static ruleset. It never was intelligent, and still isn't. When most companies describe their product as AI, it's not even LLM, it's just a variation of the ol' chess opponent.
Though I'd have to slightly disagree about your training comment. For LLM, yes, it's not training so much as just adding data points for the d
Re: Uh - duh? (Score:2)
Bleh, correction: Solely NN
Re: (Score:2)
"Training" is an accurate and correct term to use here. Not only is it the common terminology in the machine learning field for decades, but it describes what is happening.
LLM models aren't just databases, they're weighed neural networks that will produce a given result based on a given input. The training is to adjust the weights to properly produce the result. Without the training, the model produces gibberish.
Re: (Score:2)
Re: (Score:2)
That's why I call it artificial ignorance.
You would have to program it to be this stupid. [reddit.com] /s
Reason (Score:1)
There is no reasoning. It's pattern matching based on keywords and weights feeding into Markov chains. Most LLMs also have some inferencing ability hardwired in there by humans, but they don't make those inferences on their own.
Re:Reason (Score:5, Insightful)
The funny thing is... Somehow our ability to reason is an emergent property of weighted connections in a network. Because we don't understand how that happens, we don't know why it isn't happening with the AI we have created, or if it's even possible with the setups we're using. We also don't know if it's impossible for a sufficiently complex version of an existing AI system to do it.
Probably impossible, I suspect there's more than just 'embiggen it and it will happen'.
Re: (Score:3)
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
It you have a mind, that was trained over many years to take care, protect, feed, its own physical shell. How much worth would be this mind without this shell? How much similar will be another mind that was trained without any need to take care of its physical shell? Without that shell? Without eyes, without ears, without hands and legs? I am sure, it will be quite d
Re: (Score:2)
Idiocracy bucket problems (Score:2)
If I gave you a 5 gallon bucket and a 2 gallon bucket, how many buckets did I give you?
Re: (Score:2)
Re: (Score:2)
>> Somehow our ability to reason is an emergent property of weighted connections in a network.
Our weighted network has been honed by hundreds of millions of years or evolution, it's not just random.
But even more importantly, the ability to reason is also an emergent property of years and years of the network _actually interacting with the environment_ through seeing, touching, hearing and so on. That is VERY different than just seeing and predicting words.
Re: (Score:2)
By stating "Somehow our ability to reason is an emergent property of weighted connections in a network" you are making a very strong assumption about what produces the ability to reason in humans. Begging the question is another way to say you are assuming the conclusion.
Once you assume the strong conclusion that reason is an emergent property of a simple kind of network, then you are trapped in the paradoxical observation that reason is not in fact em
Re: (Score:2)
we don't know why it isn't happening with the AI we have created, or if it's even possible with the setups we're using.
It is explicitly impossible to create reasoning out of the "AI" techniques that we have today. The techniques that we have will likely be a part of a 'reasoning' AI, but there is nowhere near enough infrastructure to support reasoning currently.
Re: (Score:2)
There is no reasoning.
Correct.
It's pattern matching based on keywords and weights feeding into Markov chains.
Incorrect. LLMs are non-Markovian.
Re: (Score:3)
Even if you were to consider RAGs instead of pure chatbots, you would still have a Markov chain in an external environment, which makes them Markov Decision Processes (MDPs or even POMDPs).
The important bit is the Markovian structure, which arises because the input window is limited and the system has well defined state transition probabilities.
Re: (Score:2)
You should get a refund for your CS degree. It's clearly defective.
LLMs very obviously violate the Markov property: P(Xt+1 = s|Xt, ..., X0=s0) = P(Xt+1 = x|Xt=st)
More on this after the break...
Even if you were to consider RAGs instead of pure chatbots, you would still have a Markov chain in an external environment
Not by according to you. By introducing RAG, you no longer have "well-defined state transition probabilities" and your "input window" is arguably extended indefinitely. Oops!
You're making the same mistake that every undergrad makes when they confuse the practical with the theoretical and completely miss the point.
Re: (Score:2)
The formula you give is not the correct definition, you're missing a whole class of other cases. But given your attitude, I'll let you figure it out for yourself.
Re: (Score:2)
I'll let you figure it out for yourself.
Translation: "I can't contradict anything you've written, but I don't want to admit that I was wrong"
What a joke!
Re: (Score:2)
Dupe di-dupe di-dupe di-dupe dupe dupe (Score:5, Insightful)
And please stop claiming "faults" in "LLM reasoning abilities". LLMs have no reasoning abilities and pattern matching is not a valid substitute.
Re: (Score:1)
Re: (Score:2)
Hmm. I do admit I sometimes forget the low "reasoning ability" level many people operate on.
Re:Dupe di-dupe di-dupe di-dupe dupe dupe (Score:4, Interesting)
Question is, do Slashdot editors have enough reasoning abilities, considering the dupefest here?
Re: (Score:2)
Maybe Apple could do a study revealing the critical flaws in Slashdot editors' "reasoning" abilities?
Re: (Score:2)
Re: (Score:2)
The headline/description is garbage. ... it’s very possible that creativity and what we think of us as human intelligence are just an emergent property of a small number of algorithms operating with a lot of compute power"
but
Apple needs to temper peoples expectations when Sam Altman is writing things like:
"
and
"We decry current machine intelligence as cheap tricks, but perhaps our own intelligence is just the emergent combination of a bunch of cheap tricks."
Even Mira Murati's papers point in th
This needed a study? (Score:2, Offtopic)
I expect we'll see a response from Sam Altman and his ilk within days talking about how reasoning ability is overrated anyway, and the artificial intelligence is superior to supposed "real" intelligence on such a level that we simply aren't equipped to understand the reasoning ability of such a superior creation.
My god, this is stupid. Reasoning ability in LLMs? Just as well say every database in existence has reasoning ability just because you can type a somewhat english looking phrase in (SELECT * FROM $F
Re: (Score:1)
Sorry, but you went there. I couldn't resist.
https://xkcd.com/327/ [xkcd.com]
Non deterministic (Score:1)
Re: (Score:3)
What "non deterministic nature"? And why are "guarantees" of "results" important?
"For example, If the AI sends a 1 in a million mass email that is highly offensive, the AI producer/maintainer probably has language stating they're not liable."
They'll have that anyway. It's a problem of legal accountability, not a characteristic of LLMs that you cannot accurately describe.
Re: (Score:2)
I believe that they are fully deterministic, but generation runs are seeded with random numbers intentionally.
Re: (Score:1)
2+2=4. Fully deterministic. Always yields same results given 2+2=? as input.
Vs.
LLM given same user input multiple times yielding different results each time? Non deterministic.
By definition if a random number generator is a key part of your algorithm, it is not deterministic. This should be self evident.
Re: (Score:2)
I'll use image generators as an example because even though it's a different algorithm, they work in a lot of the same ways.
You put in a text prompt and get a different image every time, right? No. You can re-run the same prompt with the same seed and get exactly the same picture out of it. You just have to have control over the model to enable that. So maybe not Bing Image Generator but definitely Stable Diffusion.
It's pseudorandom numbers, so yes - it's deterministic.
Re: (Score:2)
LLM given same user input multiple times yielding different results each time?
If you're talking about giving the same prompt to a chat bot twice in a row, I should point out that you're not presenting the same input multiple times.
By definition if a random number generator is a key part of your algorithm, it is not deterministic.
Slow down there, cowboy. We need to have a little talk about determinism, random number generators, and what constitutes an input. For clarity, when discussion LLMs specifically, I'm going to separate the model, the bit that spits out a list probabilities, from the bit that selects the output token and the loop.
A function is deterministic if it always pro
Re: (Score:1)
Ok, fair point when taken at the unit level. I'll buy that. Yes, if I control the entire system, including any random seeds, the model doesn't change from previous inputs, etc, then yes, I should get the same output each time. Agreed.
But, if we look at the things from the outside the way most users will interact, the system as a whole is not under their control and the output they get for the same input is not guaranteed to be the same on subsequent runs.
So, yes, to a researcher they are (or can be made
Re: (Score:2)
I was going to post roughly the same thing, but you covered it so well, no need. I hope you get modded up.
But, it made me think of a few additional points:
1
I have a post below, "Similarity to fractals and NLD" in which I explain that I have no experience with language AI, but I do with imagery AI. So, going off of that, when a stable diffusion image is generated, it starts with a field of randomly generated noise, then the prompts and the language model seek patterns emerging. That gives you roughly 1024
Re: (Score:2)
Re: (Score:1)
$x = time();
Print $x
Deterministic?
LLM using time() or rand()... deterministic?
Re: (Score:2)
Yes. They always produce the same output for any given input. That makes them deterministic.
Again? (Score:2)
They did the same a couple of days ago:
https://apple.slashdot.org/sto... [slashdot.org]
Re: (Score:2)
Apple is thorough.
Slashdot editors, not so much.
Re: (Score:2)
"Hey, LLM, has this article been posted already?"
See, AI could improve /. Maybe it's only as smart as a cat but if that cat can spot dupes that's something editors miss.
Humans use cats to hunt mice too. Not because cats are good at anything else but being mean, but they excel at that. Same with LLM pattern matching.
Apple always shits on tech they're way behind on - until they "revolutionize" it and it's the next best thing. Remember when fanbois were worshiping the Lightning Cable?
They'll snap-to on AI
Not able to reason. (Score:2)
"Generative AI" is simply not capable of what we would universally consider reasoning. LLMs and other "reflexive" pattern-matching systems may be a stepping stone on the way to AGI, or, they may be a cul-de-sac, and won't have anything at all to do with AGI, if such a thing ever comes to be.
I really question this. (Score:2)
I mean, take any formal math proof. You have a set of transformations you can make to existing statements, a set of existing statements, and you apply them to get the form you want. All of this is realizable within a neural network, so any output can only be the product of an input plus a transformation.
More Discussion on this from 2 Days Ago (Score:2)
https://apple.slashdot.org/sto... [slashdot.org]
The critical flaw is that... (Score:2)
...they have NO reasoning ability
It's all statistics and clever math
I tried it, they are right (Score:1)
I ask how much is 3+5?
If I change just one character, the '3' to '4', I get a completely different answer.
Novel thought (Score:2)
IQ Test (Score:3)
Similarity to fractals and NLD (Score:2)
I have not studied AI in great technical detail. I am not completely stupid on the subject, but just pretty basic. My comment here comes not from detailed knowledge or insight about AI, but from something that I do know intimately and deeply well - fractals and non-linear dynamics.
Also, I have not used ChatGPT or any other text writing AI agent. But, I have messed around a bit with the image generators, to see what it is all about. On the image-generation subject, I have studied the AI process a little
Have they tried no changes? (Score:2)
In my experience you don't even need to change prompts to get wildly different results. Simply dump context and try again.
Should Have Read... (Score:1)