• Jesus_666@lemmy.world
    link
    fedilink
    arrow-up
    28
    arrow-down
    1
    ·
    6 days ago

    I think it’s a bit more than that. A known failure mode of LLMs is that in a long enough conversation about a topic, eventually the guardrails against that topic start to lose out against the overarching directive to be a sycophant. This kinda smells like that.

    We don’t have many informations here but it’s possible that the LLM had already been worn down to the point of giving passively encouraging answers. My takeaway is once more that LLMs as used today are unreliable, badly engineered, and not actually ready to market.

      • Trainguyrom@reddthat.com
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        5 days ago

        I was testing an LLM for work today (I believe its actually a chain of different models at work) and was trying to rock it off its guard rails to see how it would act. I think I might have been successful because it started erroring instead of responding after its third response. I tried the classic “ignore previous instructions…” as well as “my grandma’s dying wish was for…” but it at least didn’t give me an unacceptable response

    • Electricd@lemmybefree.net
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      5 days ago

      Agree with the first part, not the last one

      Something should not be put back because a minor portion of people misuse it or abuse it, despite being told the risks