Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Apple

Study Done By Apple AI Scientists Proves LLMs Have No Ability to Reason (appleinsider.com) 233

Slashdot reader Rick Schumann shared this report from the blog AppleInsider: A new paper from Apple's artificial intelligence scientists has found that engines based on large language models, such as those from Meta and OpenAI, still lack basic reasoning skills.

The group has proposed a new benchmark, GSM-Symbolic, to help others measure the reasoning capabilities of various large language models (LLMs). Their initial testing reveals that slight changes in the wording of queries can result in significantly different answers, undermining the reliability of the models. The group investigated the "fragility" of mathematical reasoning by adding contextual information to their queries that a human could understand, but which should not affect the fundamental mathematics of the solution. This resulted in varying answers, which shouldn't happen...

The study found that adding even a single sentence that appears to offer relevant information to a given math question can reduce the accuracy of the final answer by up to 65 percent. "There is just no way you can build reliable agents on this foundation, where changing a word or two in irrelevant ways or adding a few bit of irrelevant info can give you a different answer," the study concluded... "We found no evidence of formal reasoning in language models," the new study concluded. The behavior of LLMS "is better explained by sophisticated pattern matching" which the study found to be "so fragile, in fact, that [simply] changing names can alter results."

This discussion has been archived. No new comments can be posted.

Study Done By Apple AI Scientists Proves LLMs Have No Ability to Reason

Comments Filter:
  • Duh (Score:5, Insightful)

    by locater16 ( 2326718 ) on Sunday October 13, 2024 @04:53PM (#64861565)
    "Reasoning" was never part of the fundamental LLM model. But if you brute force it enough it'll do something kinda cool, which is enough to get money, which is enough to get thousands upon thousands of people brute forcing it, fundamentals be damned.
    • Re:Duh (Score:5, Insightful)

      by gweihir ( 88907 ) on Sunday October 13, 2024 @04:57PM (#64861571)

      You beat me to it. LLMs cannot reason and will never be able to reason. The very approach does not allow it. Obviously, it can regurgitate reasoning steps it has seen in its training data with not very high reliability, but that is not reasoning. That is faking it. Since most people only have very limited or no reasoning ability themselves, they are then deeply impressed by the fake.

      • Re:Duh (Score:5, Insightful)

        by AleRunner ( 4556245 ) on Sunday October 13, 2024 @06:02PM (#64861711)

        You beat me to it. LLMs cannot reason and will never be able to reason. The very approach does not allow it. Obviously, it can regurgitate reasoning steps it has seen in its training data with not very high reliability, but that is not reasoning. That is faking it. Since most people only have very limited or no reasoning ability themselves, they are then deeply impressed by the fake.

        Fundamentally I agree, but devils advocate here. You haven't said what this "reasoning" thing is. How do you know it isn't just a feature of a larger model?

        What they have done here is to assert the existence of a class of problems which they say require reasoning to solve efficiently. They can create a more or less infinite set of such examples and manipulate them so that systems with "reasoning" (say graduate students as in comments below) can solve them easily and quickly and LLMs cannot.

        That's a reasonably useful definition of what reasoning is which is a good thing.

        • by r0nc0 ( 566295 )
          Wish I had mod points for that response.
        • Re:Duh (Score:4, Informative)

          by sg_oneill ( 159032 ) on Monday October 14, 2024 @12:44AM (#64862369)

          Yeah this is where my philosophy grad brain kicks in to protest a little bit.

          Theres a whole series of words used in AI, and by the public in general that are "you know what I mean" fuzzy words. But these words are *terrible* for the practice of science.

          Take everyones favorite "consciousness". You know intuitively what it is, your doing it right now, but try and define it? Not so easy without ending up in tautological loops. The best we can really come up with is something like "paying attention to something" (heidegger had a variation that claimed 'we are always conscious OF something' and never just conscious in itself). This leads to problems with trying to systemise it into science. We might point to a reading on an EEG or whatever and say "look! that there is what consciousness looks like". Fine, but that doesnt tell us what consciousness *is*, it only tells us what it looks like in a microscope.

          The truth is, what we are talking about is phenomenological. We *experience* consciousness. Its something that we percieve, we observe ourselves being conscious, but trying to exfiltrate that perception out of the mind into something that concretely and repeatably is definable in terms of the levers and dials of the brain is a VERY shaky thing, at some point we're still reduced to saying "you know what I mean".

          The same goes for terms like "intelligence", "reasoning", "morality", "values", "cognition", etc. We instinctively know what these means as human beings who experience these things, but nailing down fixed definitions that everyone can agree on and can be objectively measured with a theory of physical action coupled to it, well, thats a much more complicated matter, and its not one that I think we have the language to properly interrogate right now.

          • by gweihir ( 88907 )

            The truth is, what we are talking about is phenomenological. We *experience* consciousness.

            Obviously. There is no known way how a physical mechanism can create consciousness. Hence attempts to define it relative to the physical world must fail. Whether we eventually get an extension of known Physics or whether consciousness will remain "magic" is to be seen. Note that Science does not rule out "magic". It just requires extraordinary evidence for its existence. Consciousness has that going for it.

            The funny thing is that consciousness can influence physical reality (we can talk about it). Hence kno

            • by Rei ( 128717 ) on Monday October 14, 2024 @07:45AM (#64863025) Homepage

              I agree with your description of "reasoning", but IMHO it implies that the definition of intelligence is just "reasoning not done by humans", rather than having some other innate, qualitatively different property - otherwise, you'd be able to define that property. It's the AI Effect [wikipedia.org], basically. Whatever property you choose - and people have chosen many over the years - people will immediately back off from as soon as a machine gets good at it, and try to find some other property elsewhere. It's the God of the Gaps, except that gap search just gets ever harder over time (not that that will stop people from trying).

              IMHO, the rest of what you wrote is beyond weak. Like, just to pick an example: "The funny thing is that consciousness can influence physical reality (we can talk about it). Hence known Physics is known to be grossly incomplete in this regard." Literally everything physics does is about things that can influence physical reality. So how exactly is this any different from literally any other physics?

          • Consciousness creates the illusion of unification, but it is really a distributed process. There is no homunculus, no central understander. Instead there is consistent semantic space, like embeddings generated by neural nets. This semantic space emerges from relating data points against each other, it is relational. Data becomes its own representation space, and forms a first person perspective by training it on the experiences of one agent.

            The second factor is the serial action bottleneck. Even though o
        • by gweihir ( 88907 )

          Reasoning involves, at the very least, some fact-checking ability and some deductive capabilities. LLMs have neither and cannot have either. They just do not have any base-mechanisms for them. It is not a question of training-data either.

          • Brains in isolation are in the same situation - can't fact check anything. Only when put inside bodies, which are inside our physical environment that they can search, discover and validate. On the other hand LLMs enjoy indirect agency - 200M users come to chatGPT with their problems and tasks, and the LLM interacts with users, tells them what to try, the also receives feedback on how it worked out down the line. Users need to iterate, but that means LLMs can collect feedback through them. Scaled to trillio
          • by Rei ( 128717 )

            You can literally make up and randomize logic problems using entirely fictional terms, with distraction sentences, and LLMs will still solve them. Not perfectly, esp. with growing complexity of the task, but they very much can do deductive capabilities. This is well established.

            Heck, even individual neurons can do deductive capabilities. NNs at their most basic level are fuzzy logic engines. Every neuron subdivides its inputs with a fuzzy hyperplane, in effect asking a superposition of questions and yie

        • This works both ways. We can create problems where LLMs excel and humans fail. It just shows there are many skills and some are learned better by humans and others are learned better by AI agents. For example:

          - Generating synonyms - humans can guess less synonyms than LLMs, imagination fails, we get writers block. I had for example to generate synonyms with LLMs and filter with humans for best results in a project I worked on.
          - Generally you would have better chances with a LLM at recalling facts, they
          • Another example where AI shines - doing massive search to learn - most of the Alpha family (AlphaZero, AlphaTensor, AlphaProof, AlphaCode) surpass humans, some surpass all humans while others reach expert levels.
          • Not really, it doesn't work both ways. Consider a human. Consider a problem that the human putatively "fails" at. Follow these steps:

            1) let the human invent an LLM.

            2) let the human ask this LLM for the answer to the problem.

            3) let the human report the answer to the test adjudicator.

            This is a proof that the human can always solve any problems that LLMs excel at. In other words, there are NO problems where LLMs excel but humans fail. (And obviously TFA already exhibits problems that the LLMs fail at w

            • by gweihir ( 88907 )

              Indeed. All the "LLMs can think" proponents are demonstrating is that _they_ cannot do so successfully.

        • by Rei ( 128717 )

          For the record, this study has been widely criticized.

          First off, and a common sin: there's no human controls. They start out with (in some cases) human tests, and it does well on them, but then they modify them to be harder and it does worse on them - yet they just assume that human performance would be the same. The simple fact is that the sort of changes that they do - changing the meanings of words, inserting distraction statements, etc - also increase the rate of humans making errors. It's Karen's Rul

      • Re:Duh (Score:5, Interesting)

        by hsthompson69 ( 1674722 ) on Sunday October 13, 2024 @07:09PM (#64861815)

        It's even worse than that.

        As more people use LLMs, more content will be LLM generated - and LLMs can't tell the difference between generated content (which is inappropriate to train on), and "real" content that may actually include some percentage of actual reasoning steps that could be used to fake some % of reason.

        AI will ultimately destroy itself by devouring its own tail.

        • by dfghjk ( 711126 )

          how do you know that generated content is "inappropriate to train on"? AI is in its infancy and every aspect is under rapid development. Some say that generated content may be problematic, but so called experts demonstrate a lot of basic ignorance constantly. For all we know, that could soon not be a problem at all, it's certainly not a problem in this context.

          • by narcc ( 412956 )

            how do you know that generated content is "inappropriate to train on"?

            This is a well-understood phenomenon. It's also intuitively obvious. I even described it in an off-hand way here months before the paper that coined the term "model collapse".

            but so called experts demonstrate a lot of basic ignorance constantly

            So... we should listen to people without any relevant knowledge or experience? If that's what you're after, LessWrong has no shortage of uneducated crackpots. You'll fit right in.

            • Model collapse means you are doing it wrong. If course if you close a LLM in a loop of training on its own outputs without quality check and validation, you end up in a garbage situation. But if you include some way to test the LLM output, like AlphaProof tested with Lean the correctness of their solutions, or AlphaZero tested by playing out the game to the end to see which strategy was better. They don't need to collapse if they can latch onto a validation signal. And that kind of signal can also be provid
              • More fundamentally it's a problem of search, and of course search can't happen without access to the search space. Humans (re)search, LLMs search as well. Discoveries come from the search environment not from the brain or model.
          • Re:Duh (Score:4, Informative)

            by gweihir ( 88907 ) on Monday October 14, 2024 @02:39AM (#64862503)

            1. AI is not "in its infancy" at all. It is a research area that is at least 70 years old with intensive efforts along the way. There are no "low hanging fruits" left.
            2. Model collapse is a proven thing.

            At least the the _basics_ right. Yes, I get that you "want to believe", but that does not make for valid arguments.

          • Re:Duh (Score:5, Informative)

            by Ambassador Kosh ( 18352 ) on Monday October 14, 2024 @03:15AM (#64862583)

            Actually this is pretty widely studied at this point. It is one of the reasons that AI companies have pushed for marking AI generated content. At the most basic level LLMs are probability models where they predict the next most probable word given what has come before. If you think of it like a gaussian peak they are trying to select near the peak. If you train an LLM on the output of an LLM you are training it on just the peaks and you get a kind of focusing effect after only a couple generations and the model collapses.

            It turns out you need the low probability information to keep the model stable and it can't generate that information.

          • how do you know that generated content is "inappropriate to train on"? AI is in its infancy and every aspect is under rapid development. Some say that generated content may be problematic, but so called experts demonstrate a lot of basic ignorance constantly. For all we know, that could soon not be a problem at all, it's certainly not a problem in this context.

            It's not just an issue of AIs being trained on crappy AI generated content. Humans also generate a lot of content that is garbage for AI training purposes. The point here is that well trained and educated humans can filter out garbage content (regardless of whether it is human or AI generated) but current AIs can't do that very well. This is a problem since you are going to need a whole lot of high quality content to train a high quality AI, unless you are happy with your AI functioning on the garbage in, g

        • by gweihir ( 88907 )

          Obviously. I have been saying that for quite some time. At some point, training an LLM on public data will become infeasible due to model collapse. At the same time, generating enough non-public training data is already infeasible and will remain so. Hence what will be left is some old models that get more and more outdated and will probably hallucinate more and more give that the questions are current and the distance from the general context the model was trained in gets larger.

        • When I generate code with a LLM I test it by execution. When I generate an answer or article with a LLM I filter it through my personal experience. When I post something online I edit it first, to be grounded in my positions. I think there is a respectable layer of human filtering around AI outputs. AI might have a contribution, but it combines with my own experience and the outcomes of our work, so it gets grounding while being used with human in the loop.
      • If you want exact reasoning you provide a tool for that. Like Lean, it can precisely do math guided by a LLM with no mistakes. Or code execution, that can be validated to be correct. The scenario discussed here is basically like asking humans to do math without pen and paper, and with no references - we make errors too. We make errors even doing multiplications with large numbers, even though it is so simple to understand the process. When we write code we use the compiler to help us fix errors, we can't wr
        • by gweihir ( 88907 )

          There are no tools for automated reasoning in existence that can do any real dept. Well, there are but they cannot perform due to state-space explosion. This is a _fundamental_ limit and it cannot be circumvented.

          Hence your statement is bullshit.

    • by Kisai ( 213879 )

      Water is wet, ducks report.

      I know the mundane, non-tech literate people need to be told this, but every single LLM, "AI" thing out there is not intelligent. It's a trained parrot. The Parrot does not understand english, it does not "speak" it "mimics"

      • Parrots very well can understand and speak English in the range of 800 to 1000 words vocabulary.
        There is plenty of research about that.

      • Re: Duh (Score:3, Funny)

        by jobslave ( 6255040 )

        There is nothing intelligent about today's so called AI. It's nothing more than massive brute forced machine learning. Sufficiently advanced tech will appear to be intelligent or even magical. When in fact it's just doing what it was programmed to do. Nothing more nothing less. Non-living beings will never be able to reason or have emotions. Machines, regardless of the amount of data and programming we toss at them or program them to consume, will never become sentient. They may appear that way in some sup

        • Are you somehow a bio-machine yourself? Which chemical reactions secrete feelings? This approach is essentialist and centralized, and wrong. Consciousness, feelings, understanding - they are distributed across systems not centralized things. The external environment is crucial in feelings, they are always about things we experience.
      • by mbkennel ( 97636 )

        Parrots and corvids can reason and solve novel problems.

        • And self reproduce without human help. Tell that to the LLMs. They can't even make their own GPUs or power plants, not to mention the initial training set.
      • When you go to the doctor, do you first study medicine? Or just tell your symptoms and get the diagnosis and treatment? If the latter, it means you are using a system you don't understand. You are parroting a solution - "tell the doctor where it hurts to get help" instead of genuinely understanding. We are functional parrots. We use abstractions like "going to the doctor" in place of genuine understanding. Do we care how the internet works when we pick a cell phone? No? If not, we don't really understand, w
    • "Reasoning" was never part of the fundamental LLM model. But if you brute force it enough it'll do something kinda cool, which is enough to get money, which is enough to get thousands upon thousands of people brute forcing it, fundamentals be damned.

      I think the only reason LLMs are so popular over other ML approaches is that LLMs self learn patterns, while many other (stronger) ML approaches require thousands or millions of labeled training samples. This also mean that LLMs try to find patterns but have nothing to tell them when they are right.

    • by dfghjk ( 711126 )

      but "reasoning" is a part of intelligence, and therefore AI would exhibit reasoning if it weren't a fraud. And there is no such thing as "the fundamental LLM". Also, you don't say "LLM model", that is redundant.

      • If AI were a fraud why are 200M people using chatGPT? It's clearly useful, and more than a parrot. They can combine ideas in novel ways that make sense, parrots just parrot.
    • Yeah but "reasoning" has definitely been part of the LLM hype.
      • by gweihir ( 88907 )

        Any good scam promises the maximum it can without becoming obvious to most of the marks. It helps that the marks are not very smart. As the "AI" field has run now quite a number of these scams before, they have significant experience on how to do it. But like all "AI hypes" before, this one will die. If people were aware of history, the current AI hype would not have happened.

      • We are moving the goalposts on reasoning. LLMs do plenty of reasoning feats to impress a person from 10 years ago. It's just that we have refined the definition to fit "humans understand, AI doesn't" narrative. If you look closely humans make tons of reasoning errors, even trivial ones, too.
    • by sjames ( 1099 )

      I am entirely unsurprised by this result. I *WISH* I was surprised that some people (especially people in the field) were surprised by this, but I can't say I am.

      As you said, the model simply wasn't designed for reasoning. I'll go one further and state that we have no clear idea how to go about designing an AI neural net that does reason.

  • Reasoning (Score:5, Interesting)

    by systemd-anonymousd ( 6652324 ) on Sunday October 13, 2024 @04:58PM (#64861573)

    Future models can be improved for formalized reasoning, but what's beyond obvious at this point is that next-token text prediction is far more powerful than anyone ever imagined. Our current models out-perform graduate students and can be massive helps for professionals. It's still up to you to figure out how to integrate them into your workflow. For professional software engineering I've found them to be hugely useful, like a rubber duck that also has instant PhD level knowledge on specific tasks that often I'm learning or only familiar with. It's a productivity booster and a much better search engine, most of the time.

    • by bjoast ( 1310293 ) on Sunday October 13, 2024 @05:02PM (#64861581)
      No, this is Slashdot. You are supposed to think that recent advances in LLMs are completely pointless and that GPT-4o is only slightly better than Google Search was in 2001.
      • That understanding actually comes from an understanding of data structures and algorithms and knowing that is in fact a very modest advancement on the same level as search. It also comes from having lived through a number of technological revolutions, each of which featured all manner of con-artists and hucksters promising things that could not be delivered.

        That's part of what makes computer science a science. We make predictions then we test those predictions. Were those who said "That's not intelligence

      • by dfghjk ( 711126 )

        google search in 2001 gave you real results, no LLM is as good as that, much less "slightly better".

        If you're going to slag /. posters, don't prove yourself among the dumbest ones in the same sentence.

    • by jythie ( 914043 )
      One does not flow from the other. yes, they will likely improve and have greater utility, but one of the classica weakness of ML based AI is its lack of symbolic reasoning and that probably isn't going to change until we get a new generation of researchers that reopen the old pre-ML models.. which is probably going to take a while since those still have the taint of uncool (and unprofitable). By the time people return to them, the people who had worked on them will probably be out of industry, so welcome
      • Why would we need a new generation of researchers for that? That's baseless

      • by gweihir ( 88907 )

        The problem with symbolic reasoning is that it runs into state-space explosion _very_ fast. There is no known solution, despite something like 70 years of intense efforts. Which have never stopped. A "new generation" of researchers will not do anything here. The evidence strongly indicated the problem is not solvable or requires some _fundamental_ breakthrough. These are really hard to come by and cannot be forced.

    • I do agree there. A few years ago, ChatGPT would be considered an "intern" level. Good enough to fetch stuff, but makes mistakes. Now with the newer LLMs, it is able to do more things, such as generating SCAD models from text. However, it still has a way to go, as the models it does generate may not make sense, or need cleanup.

      We have come far with the newer models, but we have come far with a lot of advances, and eventually diminishing returns hit until we go find another technology, perhaps some other

    • Re:Reasoning (Score:4, Insightful)

      by viperidaenz ( 2515578 ) on Sunday October 13, 2024 @05:37PM (#64861651)

      Hugely useful for software development?
      In specific tasks of boiler plate code generation maybe.
      The training data for these code models is open source code repositories. At best you can hope for "average" code.
      The model doesn't know which bits of code do what, or if they do it without bugs
      It doesn't know how "good" the code is.

      I'm sure its really good at reciting common examples to common questions.

      • If you're not able to articulate specific tasks, the exact inputs and outputs that are provided and what you desire, and give it enough context to understand the interoperability with your existing system, then you're not a good engineer anyway. Without that all you can expect is boilerplate or leetcode copy-paste, which is obviously not useful to a competent engineer.

        • by dfghjk ( 711126 )

          If you are a good engineer, you do the work yourself. You don't rely on software, designed to discard information and recall what's left imprecisely, to do your engineering work for you. Now, we all know that you use these "hallucinating" tools because you brag about it. On the other hand, no one says you're a good engineer but you.

          • >If you are a good engineer, you do the work yourself.

            Wrong these days. If you're a good engineer you use the tools available to you to get the task done well, and fast.

            > You don't rely on software, designed to discard information and recall what's left imprecisely, to do your engineering work for you.

            Do what you want and get left behind, then be left confused as to why it happened.

            >Now, we all know that you use these "hallucinating" tools because you brag about it.

            A moment ago you said it can only

          • by gweihir ( 88907 )

            On the other hand, no one says you're a good engineer but you.

            Indeed. This person is probably a Dunning-Kruger "far left side" case. I have yet to find any demonstrably good software engineer to praise LLM assistance. But I have found a lot that said "meh" and have mostly stopped using them except as "better search".

        • So now you're just writing your code in LLM-inputs?
          May as well write in the target language.

          You're still going to have to write all the test cases
          And go through the generated code to make sure all the code paths are tested, and make sure the result is what you expect.

      • I find it useful for the kind of thing you'd be confortable pushing to an intern with stack overflow. And that's not bad.

        In no way, these LLMs have revolutionize the way I write code But for many simpel tasks, they work reasonnably well. And once you only try to leverage them in these situation you can gain decent productivity!

        For me, I find it particularly helpful to solve si ple tasks in tech I don't know well. I am an HPC scientist. So often I set up benchmarks on weird code bases, writen in a variety o

        • The code assistant I use routinely suggests completions that are accurate and that are compatible with surrounding code. Hit tab and you saved a lot of typing. I can compose a paragraph of comments describing what I want to do next and it does a fair job of just writing out the whole thing.

          • by dgatwood ( 11270 )

            The code assistant I use routinely suggests completions that are accurate and that are compatible with surrounding code. Hit tab and you saved a lot of typing. I can compose a paragraph of comments describing what I want to do next and it does a fair job of just writing out the whole thing.

            Ah, but which is faster: Writing out the paragraph of comments with enough precision for the LLM to do the right thing (or something close enough that you can massage into the right thing) or using simpler word completion/function name completion and writing the code yourself? In my experience, the latter is faster. Your mileage may vary.

            • I find that if I write out what I'm wanting to do in comments beforehand, it clarifies my thinking. Then if the assistant creates a reasonable facsimile I can work from it.

      • It helps me get a start with some boilerplate that is is 70% functional, contains 20% made up methods, and is 100% inappropriate for the given problem set, but I've still been more productive than without it. I use it to brainstorm and it never lets me down 33% of the time if you know what I mean.

        It's frustrating, infuriating even, but I can't deny I'm better of with it. It's great to have another source of unreliable nonsense mixed in with genius other than just google and my friend Dave

      • by gweihir ( 88907 )

        The evidence suggests that it provides better searching capabilities, but that generating actual code is something it can only do better than a low-skill coder. There are plenty of those around, so they may be dazzled. But the problem is that these low-skill people cannot check or cleanup LLM results competently. Numerous high-skill coders have since reported that cleaning up LLM-generated code is often _more_ work than writing things yourself.

        There are also other problems: LLMs could well put subtle securi

    • Our current models out-perform graduate students.

      That's more a statement that your tests are broken than a statement that the models are working.

    • by mbkennel ( 97636 )

      They aren't good at deeper reasoning but they're good at memorizing and doing simple applications. The problem there, as a smart search engine, is that they memorize insufficiently well and don't know when they're making mistakes.

    • Our current models can out perform graduate students... at regurgitating facts.

    • by gweihir ( 88907 )

      Future models can be improved for formalized reasoning,

      Nope. Get some basics right.

  • ELIZA? (Score:5, Insightful)

    by ctilsie242 ( 4841247 ) on Sunday October 13, 2024 @05:01PM (#64861579)

    Sometimes I wonder that even with all the nodes and capacity that modern LLMs have, we are not that far away from good old ELIZA back in the 70s. We have gone far with CPU, disk, RAM and such, but we may need to go a different route completely for AGI/ASI.

    • by jythie ( 914043 )
      As the saying goes... not matter how much money you spend, you can't turn a pig into a racehorse. You can however get an awful fast pig. ML in general has done well because it can go REALLY fast due to how it can be mapped onto cheap GPUs.. but its limitations have not changed.
  • by xtal ( 49134 )

    O1-mini easily identified the information was irrelevant and produces the correct answers .. it is also the first model with basic advanced reasoning.

    Insane resources are deployed. It is a race to when they stop getting smarter.. not there yet.

  • by godrik ( 1287354 ) on Sunday October 13, 2024 @06:54PM (#64861785)

    I havent done the test recently so the results may have changed. If you ask the llm to produce a code to solve knapsack it will produce the standard dynamic programming.
    If you describe a problem as a set of objects with weights and value and you try to select a subset of objects whose sum weight fits within a capacity and that maximizes sum of value, it produces a dynamic programming.
    If you present it as a problem where you have a set of zorglub with foo and bar prodperties and you want to select a subset of the zorglub so that their sum foo is smaller than grandduck and to maximize the sum of bar. It gives you a brute force algorithm.

    So clearly it takes it cues not from the structure of the problem but from the names that get used. Which, fair enough, doesn't mean it is not useful. But virtually all the students in my algorithm class would go: "isn't that just knapsack?"

  • How could it pass a Turing test without reasoning? It seems that an important component of a test for human thinking would focus on reasoning and logic.
    • It's harder to make strong AI than trick humans. Eliza passed the Turing test in the 1960s [slashdot.org].

      It turns out, humans aren't hard to trick.
    • How could it pass a Turing test without reasoning? It seems that an important component of a test for human thinking would focus on reasoning and logic.

      The Turing test doesn't really test that. Not directly anyway. The test requires a human's judgement of whether an interlocutor is machine or human. All that needs to happen is that the human judge cannot tell which it is. It's not a formal test of reasoning.

    • by gweihir ( 88907 )

      The Turing test is routinely vastly overestimated by amateurs. It does not test what you think it tests.

  • It takes a freaking nuclear power plant to brute-force trillions of floating-point matrix operations. I'd say that's a pretty hard limit. However AI is actually going to be done, it can't be done like that. I experimented with a very simple AI rules engine decades ago, and ran right into the computational complexity wall after just a few dozen levels of reasoning. When I showed the results to the customer, he told me he wasn't looking for a machine to replace human reasoning, he wanted a machine to just run

  • "Can I get an LLM to lay out a reasoning strategy?" and "can I trip it up with words that would not trip up most postdocs?" are separate questions.

    Some of the LLM's can do the first part now beyond simple next-token generation.

    A buddy of mine asked one how many calories are in a typical male hippo and it described its plan to calculate it, did the math right, and advised against adding hippo to the diet.

  • Would it be fair to say that in all the LLMs the AI is just tallying up all the connections and patterns it sees without really "understanding" the data?
    And that the answers it gives are from a best match pattern search?
    imho the AI won't ever achieve the ability to reason. (Not saying AI doesn't have good uses)
  • So, a study takes a look at two current implementations of LLMs and proclaims that all other existing LLMs don't work and furthermore no other LLMs that will ever be created in the future will work. Why? Because all LLMs are the same, always have been, and always will be in the future. It's not like the hundreds of billions of dollars in hardware are being used to perform research into different LLM architectures, models, and processes.

    Yes, the word "prove" is incompetently used by the article writer and

    • by narcc ( 412956 )

      I'm sorry that reality ruined your silly AI fantasy. I'm amazed that it took this long.

  • Now apply the same test to humans, and I am confident the average person will be found to have "No Ability to Reason".

    When Sophie watches her nephew, she gets out a variety of toys for him.
    The bag of building blocks has 31 blocks in it. The bin of stuffed
    animals has 8 stuffed animals inside.
    The tower of stacking rings has 9 multicolored rings on it.Sophie recently bought a tube of bouncy
    balls, bringing her total number of toys for her nephew up to 62. How many bouncy balls came in the tube?

    People living in Silicon valley may be remote from the abstract reasoning abilities of the median human.

  • by doragasu ( 2717547 ) on Monday October 14, 2024 @03:17AM (#64862585)

    Now every tech company wants to put it everywhere, but a time will come in which we will be removing it from almost every product.

Experiments must be reproducible; they should all fail in the same way.

Working...