So I was reading this article about Signal-creator Moxie Marlinspike’s new project, Confer , which claims to be a verifiably E2E encrypted LLM chat service. There are a couple of short blog articles that give the gist of it, and some github repos including this one that includes scripts for producing the VM that will run your particular LLM session. But if I’m following this all correctly, it implies that every chat session (or perhaps every logged-in user) would have their own VM running their own LLM to ensure that the chain of trust is complete. This seems impossible from a scalability perspective, as even small LLMs require huge quantities of RAM and compute. Did I miss something fundamental here?

  • Autonomous User@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 hours ago

    They think they can get security by using a proprietary chip to hide from the rest of the computer. This is a blantant lie.

  • kumi@feddit.online
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    7 hours ago

    Possibly oversimplifying and didn’t have a proper read yet: If you trust the hardware and supply-chain security of Intel but not the operational security of Cloudflare or AWS, this would allow you to exchange messages with the LLM without TLS-encryption-stripping infrastructure operators being able to read the messages in cleartext.

    This is a form of Confidential Computing based on Trusted Execution Environments. IMO the real compelling use of TEEs is Verifiable Computing. If you have three servers all with chips and TEEs from different vendors, you can run the same execution on all of them and compare results, which should always agree. You will be safe from the compromise of any single one of them. For Confidential Computing, any single one being compromised means the communication is compromised. The random nature of LLM applications makes Verifiable Computing non-trivial and I’m not sure what the state-of-art is there.

    And yes it does look like it has overhead.

    This seems impossible from a scalability perspective, as even small LLMs require huge quantities of RAM and compute. Did I miss something fundamental here?

    Well isn’t it the other way around? If the per-user resources are high, the additional sublinear overhead of isolating gets relatively smaller. It costs more to run 1000 VMs with 32MB RAM each vs 2 VMs with 16GB RAM each.

    However I guess this might get in the way of batching and sharing resources between users? Is this mentioned?

    • FauxLiving@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      3 hours ago

      The random nature of LLM applications makes Verifiable Computing non-trivial and I’m not sure what the state-of-art is there.

      Running inference is only pseudorandom, the output is then treated as a distribution and a pseudorandom selection is made according to the distribution. The heavy compute parts are all deterministic, the bit at the end that adds chatbot flavor is only pseudorandom.

      As long as they share entropy sources, given the same seed they will always produce the same output. This is a trick that’s exploited in image generation applications, using cached execution keyed on the seed means that alterations only need to be calculated at the part of the pipeline that needs to be changed (saving all of the previous steps).

    • Pup Biru@aussie.zone
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      3 hours ago

      LLMs don’t have to be random AFAIK: if you turn down the temperature parameter and send the same seed every time you get the same result

      https://dylancastillo.co/posts/seed-temperature-llms.html

      for most people this isn’t exactly what you want because “temperature” is sometimes short-handed as “creativity”. it controls how out of left field the result can be

    • artifex@piefed.socialOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 hours ago

      The articles are light on detail but the code’s all there. The approach makes sense if the VMs are not cryptographically signed with the user’s key, but are just signed against another key to verify authenticity. I read it as if each VM was created on the fly for a user and signed with that users’s key, but that seems unlikely after re reading it.

  • Dave@lemmy.nz
    link
    fedilink
    arrow-up
    21
    arrow-down
    1
    ·
    10 hours ago

    The blog article you link I think implies you do not have your own VM. LLMs are stateless, the previous conversation is fed in as part of the prompt.

    You send your message, which is E2E encrypted. The LLM runs in an environment where it can decrypt your message and run in through the LLM, then send a response to you. Then it gets the next user’s message and replies to them.

    The key part is that the LLM is running inside an encrypted environment not accessible to the host system, so no one can watch as it decrypts your message.

    That’s what I get from reading your links.

    • artifex@piefed.socialOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      9 hours ago

      Ok, I interpreted it to mean that the VMs were being created as-needed and was keyed to your key specifically (which would be the most secure scenario, I think) and couldn’t figure out what that could possibly work economically. But it makes more sense if just a separately encrypted host is decrypting your request and encrypting your reply along with everyone else’s.

  • Lemmchen@feddit.org
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    9 hours ago

    in TEE environment

    Moxie loves these closed-source hardware enclaves. Signal servers and MobileCoin rely on this too, if I’m not mistaken.
    In my opinions this not the right way to go about these things, but then again I’m not a renowned cryptohead like Moxie.