• otacon239@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    1
    ·
    5 days ago

    It’s a fundamental limitation of how LLMs work. They simply don’t understand how to follow a set of rules in the same way as a traditional computer/game is programmed.

    Imagine you have only long-term memory that you can’t add to. You might get a few sentences of short-term memory before you’ve forgetting the context of the beginning of the conversation.

    Then add on the fact that chess is very much a forward-thinking game and LLMs don’t stand a chance against other methods. It’s the classic case of “When all you have is a hammer, everything looks like a nail.” LLMs can be a great tool, but they can’t be your only tool.

    • Lucy :3@feddit.org
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      5 days ago

      Or: If it’s possible to create a simple algirithm, that will always be infinitely more accurate than ML.

      • snooggums@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        ·
        5 days ago

        That is because the algorithm has an expected output that can be tested and verified for accuracy since it works consistently every time. If there appears to be inconsistency, it is a design flaw in the algorithm.

    • snooggums@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 days ago

      MY biggest disappointment with how AI is being implemented is the inability to incorporate context specific execution if small programs to emulate things like calculators and chess programs. Like why does it doe the hard mode approach to literally everything? When asked to do math why doesn’t it execute something that emulates a calculator?

      • otacon239@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        5 days ago

        I’ve been waiting for them to make this improvement since they were first introduced. Any day now…

      • Ephera@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 days ago

        That’s definitely being done. It’s referred to as “tool calling” or “function calling”: https://python.langchain.com/docs/how_to/tool_calling/

        This isn’t as potent as one might think, because:

        1. each tool needs to be hooked up and described extensively.
        2. the naive approach where the LLM generates heaps of text when calling these tools, for example to describe the entire state of the chessboard as JSON or CSV, is unreliable, because text generation is unreliable.
        3. smarter approaches, like having an external program keeping track of the chessboard state and sending it to a chess engine, so that the LLM only has to forward the move that the user described, don’t really make sense to incorporate into a general-purpose language model. You can find chess chatbots on the internet, though.

        But all-in-all, it is a path forward where the LLMs could just do the semantics and then call a different tool for each thinky job, serving at least as a user interface.
        The hope is for it to also serve as glue between these tools, automatically calling the right tools and passing their output into other tools. I believe, the next step in this direction is “agentic AI”, but I haven’t yet managed to cut through the buzzword soup to figure out what that actually means.

      • Zos_Kia@lemmynsfw.com
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        5 days ago

        Chatgpt definitely does that. It can write small python programs and execute them, but it doesn’t do it systematically, you have to prompt for it. It can even use chart libs to display data.

    • Blue_Morpho@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      5 days ago

      It’s a fundamental limitation of how LLMs work.

      LLMs have been adding reasoning front ends to them like O3 and deep seek. That’s why they can solve problems that simple LLM’s failed at.

      I found one reference to O3 rated at chess level 800 but I’d really like to see Atari chess vs O3. My telling my friend how I think it would fail isn’t convincing.