AI went nuts on my website and generated a $155 excessive bandwidth bill

limelight79@lemmy.world · 4 days ago

AI went nuts on my website and generated a $155 excessive bandwidth bill

utopiah@lemmy.world · 3 days ago

Tech support found that AI bots were crawling the site repeatedly. In particular, OpenAI’s bot was hitting it extremely hard.

Yup… I just had to read your title to know how it happened. In fact more than a year ago at OFFDEM (the off discussion parallel to FOSDEM in Brussels) we discussed how to mitigate such practices because at least 2 of us self-hosting had this problem. I had problem with my own forge because AI crawlers generate archives and that quickly generate quite a bit of space. It’s a well known problem that’s why there are quite a few “mazes” out there or simply blocking rules for HTTPS or reverse proxies.

AI hype is so destructive for the Web.

Spice Hoarder@lemmy.zip · 2 days ago

There has to be a better way to do this. Like using a hash or something to tell if a bot even Need to scrape again.

utopiah@lemmy.world · 2 days ago

No doubt there are better ways … but I believe pure players, e.g. OpenAI or Anthropic, or resellers who get paid with scaling, e.g AWS, equate very large scale with moat. So they get so much funding that they have ridiculous computing resources, probably way WAY cheaper for “old” cloud (i.e. anything but GPUs) than new cloud (GPUs) so basically they put 0 effort to optimize anything. They probably even brag about how large their “dataset” is despite it being full of garbage data. They don’t care because in their marketing materials they claim to train over Exabytes of data or whatever.