On January 1, I received a bill from my web hosting provider for a bandwidth overage for $155. I’ve never had this happen before. For comparison, I pay about $400/year for the hosting service, and usually the limitation is disk space.

Turns out, on December 17, my bandwidth usage jumped dramatically - see the attached graph.

I run a few different sites, but tech support was able to help me narrow it down to one site. This is a hobbyist site, with a small phpBB forum, for a very specific model of motorhome that hasn’t been built in 25 years. This is NOT a high traffic site; we might get a new post once a week…when it’s busy. I run it on my own dime; there are no ads, no donation links, etc.

Tech support found that AI bots were crawling the site repeatedly. In particular, OpenAI’s bot was hitting it extremely hard.

Here’s an example: There are about 1,500 attachments to posts (mostly images), totaling about 1.5 GB on the disc. None of these are huge; a few are into the 3-4 megabyte range, probably larger than necessary, but not outrageously large either. The bot pulled 1.5 terabytes on just those pictures. It kept pulling the same pictures repeatedly and only stopped because I locked the site down. This is insane behavior.

I locked down the pictures so you had to be logged in to see them, but the attack continued. This morning I took the site offline to stop the deluge.

My provider recommended implementing Cloudflare, which initially irritated me, until I realized there was a free tier. Cloudflare can block bots, apparently. I’ll re-enable the site in a few days after the dust settles.

I contacted OpenAI, arguing with their bot on the site, demanding the bug that caused this be fixed. The bot suggested things like “robots.txt”, which I did, but…come on, the bot shouldn’t be doing that, and I shouldn’t be on the hook to fix their mistake. It’s clearly a bug. Eventually the bot gave up talking to me, and an apparent human emailed me with the same info. I replied, trying to tell them that their bot has a bug to cause this. I doubt they care, though.

I also asked for their billing address, so I can send them a bill for the $155 and my consulting fee time. I know it’s unlikely I’ll ever see a dime. Fortunately my provider said they’d waive the fee as a courtesy, as long as I addressed the issue, but if OpenAI does end up coming through, I’ll tell my provider not to waive it. OpenAI is responsible for this and should pay for it.

This incident reinforces all of my beliefs about AI: Use everyone else’s resources and take no responsibility for it.

  • gladflag@lemmy.ml
    link
    fedilink
    arrow-up
    87
    arrow-down
    3
    ·
    5 days ago

    TBH it feels like a bug if they’re redownloading the same images again and again.

    • DreamButt@lemmy.world
      link
      fedilink
      English
      arrow-up
      112
      arrow-down
      4
      ·
      5 days ago

      Assuming A) honest intentions and B) they give a fuck

      OpenAi isn’t exactly known for either

      • leds@feddit.dk
        link
        fedilink
        arrow-up
        32
        arrow-down
        2
        ·
        5 days ago

        I’m wondering, are they intentionally trying to kill the open web? Make small websites give up and then AI has monopoly on useful information?

        • jollyrogue@lemmy.ml
          link
          fedilink
          arrow-up
          3
          arrow-down
          1
          ·
          3 days ago

          Yes. This is it.

          One of the great things about Web3 and AI, for corps, is forcing decentralized systems into centralized platforms, limiting hosting access to people who have money, and limiting competition to companies which have the capital to invest in mitigations, or the money to pay for exceptions.

        • sexhaver87@sh.itjust.works
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          edit-2
          3 days ago

          Their intentions remain unclear, however given their CEO’s desire for unchecked mass-scale absolute power, I’d bet on this!

          e: all this is in addition to the data they collect via their web crawling, the bugs resulting in this behavior and its effects are either happy accidents or intentional malware, right now depending on your distaste for the company. Ultimately none of this is set in stone until the psychotic criminals at OpenAI get audited or jailed.

    • artyom@piefed.social
      link
      fedilink
      English
      arrow-up
      9
      ·
      5 days ago

      I would agree except they do the same thing to thousands (millions?) of sites across the web every day. Google will scrape your site as well but they manage to do it on a way that doesn’t absolutely destroy it.

      • DaPorkchop_@lemmy.ml
        link
        fedilink
        arrow-up
        2
        ·
        3 days ago

        I beg to differ, a few months ago my site was getting absolutely hammered by GoogleBot with hundreds of requests per second, faster than my server could keep up with - to the point that the entire apache daemon kept locking up.

      • limelight79@lemmy.worldOP
        link
        fedilink
        arrow-up
        11
        ·
        5 days ago

        Yeah exactly. I want people to be able to find the info, that’s the whole point. Legitimate search engines, even Bing, are fine.

        • artyom@piefed.social
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          1
          ·
          edit-2
          4 days ago

          Good luck, they are firmly in the pocket of the federal govt at this point. They’re allowed to do whatever they want because our entire economy hinges on allowing them to do so.