AI went nuts on my website and generated a $155 excessive bandwidth bill

limelight79@lemmy.world · 30 days ago

AI went nuts on my website and generated a $155 excessive bandwidth bill

artyom@piefed.social · edit-2 29 days ago

What you are experiencing is the unfortunate reality of hosting any kind of site on the open internet in the AI era. You can’t do it without implementing some sort of bot detection and rate limiting or your site will either be DDOS’d or you’ll incurr insane fees from your provider.

The bot suggested things like “robots.txt”,

You can do that but they will ignore it.

I’ll re-enable the site in a few days after the dust settles.

They’ll just attack again.

It’s clearly a bug.

It’s not a bug. This is very common practice these days.

My provider recommended implementing Cloudflare, which initially irritated me, until I realized there was a free tier.

Please consider Anubis instead.

gladflag@lemmy.ml · 29 days ago

TBH it feels like a bug if they’re redownloading the same images again and again.

DreamButt@lemmy.world · 29 days ago

Assuming A) honest intentions and B) they give a fuck

OpenAi isn’t exactly known for either

leds@feddit.dk · 29 days ago

I’m wondering, are they intentionally trying to kill the open web? Make small websites give up and then AI has monopoly on useful information?

jollyrogue@lemmy.ml · 28 days ago

Yes. This is it.

One of the great things about Web3 and AI, for corps, is forcing decentralized systems into centralized platforms, limiting hosting access to people who have money, and limiting competition to companies which have the capital to invest in mitigations, or the money to pay for exceptions.

sexhaver87@sh.itjust.works · edit-2 28 days ago

Their intentions remain unclear, however given their CEO’s desire for unchecked mass-scale absolute power, I’d bet on this!

e: all this is in addition to the data they collect via their web crawling, the bugs resulting in this behavior and its effects are either happy accidents or intentional malware, right now depending on your distaste for the company. Ultimately none of this is set in stone until the psychotic criminals at OpenAI get audited or jailed.

artyom@piefed.social · 29 days ago

I would agree except they do the same thing to thousands (millions?) of sites across the web every day. Google will scrape your site as well but they manage to do it on a way that doesn’t absolutely destroy it.

limelight79@lemmy.world · 29 days ago

Yeah exactly. I want people to be able to find the info, that’s the whole point. Legitimate search engines, even Bing, are fine.

DaPorkchop_@lemmy.ml · 28 days ago

I beg to differ, a few months ago my site was getting absolutely hammered by GoogleBot with hundreds of requests per second, faster than my server could keep up with - to the point that the entire apache daemon kept locking up.

wonderingwanderer@sopuli.xyz · 29 days ago

Sounds like a class action lawsuit

artyom@piefed.social · edit-2 29 days ago

Good luck, they are firmly in the pocket of the federal govt at this point. They’re allowed to do whatever they want because our entire economy hinges on allowing them to do so.

AsgerFD@programming.dev · edit-2 29 days ago

There’s also Iocaine ~~as an alternative to Anubis~~. I’ve not tried Anubis nor Iocaine myself though.

db0@lemmy.dbzer0.com · 29 days ago

Iocaine is not an alternative to anubis. It’s a different tool in the toolbox and can be used along with it, but it has a different purpose. Something like haphash is an anubis alternative

NotMyOldRedditName@lemmy.world · edit-2 29 days ago

Even if not using cloudflare / others for bot protection, setting up your images and other static content to be served from a CDN can help.

You can set that up with cloudflare and others as well.

CallMeAnAI@lemmy.world · edit-2 29 days ago

Removed by mod

artyom@piefed.social · 29 days ago

It is not at all 30 years old. Search engines never DDOS’d your site to death. Only these shitty AI scrapers do that. That’s why everyone wants their site scraped by Google while spending inordinate amounts of time and money to block AI scraper bots.

CallMeAnAI@lemmy.world · 29 days ago

Removed by mod

artyom@piefed.social · 29 days ago

Oh, honey…

elbarto777@lemmy.world · edit-2 11 days ago

deleted by creator

dependencyinjection@discuss.tchncs.de · 29 days ago

You keep saying this but that’s just not the case. Robots.txt would be respected by older bots and they wouldn’t DDOS your site either.

artyom@piefed.social · 29 days ago

I said it once. And “older bots” are not the problem. AI crawlers are the problem. Robots.txt was originally created with the primary intent of keeping websites off of search engines. Server load wasn’t even really an issue.

CallMeAnAI@lemmy.world · 29 days ago

Removed by mod

dependencyinjection@discuss.tchncs.de · 29 days ago

Read the room bro.