• merc@sh.itjust.works
    link
    fedilink
    arrow-up
    3
    ·
    13 hours ago

    My guess is that this was necessary because the AI companies already downloaded the offline versions of Wikipedia. But, they think they can one-up their competition by having “fresher data” so they either hammer the download servers and download the 25 GB full offline version multiple times a day, just in case it changed. Or, they might crawl and scrape Wikipedia so they get the data before it makes it into the daily offline version, or something.

    It wouldn’t be hard for Wikipedia to provide them a feed of the changes going to the Wikipedia database so they get the data as fresh as it can possibly be. Plus, doing this most likely reduces the antisocial behaviours that the AI companies would otherwise engage in to get their fresh data. Win, win. Even if it sucks to give these AI companies a win.