Reddit said in a filing to the Securities and Exchange Commission that its users’ posts are “a valuable source of conversation data and knowledge” that has been and will continue to be an important mechanism for training AI and large language models. The filing also states that the company believes “we are in the early stages of monetizing our user base,” and proceeds to say that it will continue to sell users’ content to companies that want to train LLMs and that it will also begin “increased use of artificial intelligence in our advertising solutions.”

The long-awaited S-1 filing reveals much of what Reddit users knew and feared: That many of the changes the company has made over the last year in the leadup to an IPO are focused on exerting control over the site, sanitizing parts of the platform, and monetizing user data.

Posting here because of the privacy implications of all this, but I wonder if at some point there should be an “Enshittification” community :-)

  • Fubarberry@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Reddit has long had an issue with confidently providing false statements as fact. Sometimes I would come along a question that I was well educated on, and the top voted responses were all very clearly wrong, but sounded correct to someone who didn’t know better. This made me question all the other posts that I had believed without knowing enough to tell otherwise.

    Llms also have the same issue of confidently telling lies that sound true. Training on Reddit will only make this worse.

  • Daniyyel@lemm.ee
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    Is this a long term source of revenue for Reddit? Or will it loose value at some point, simply because LLMs are all trained sufficiently on user generated content. Is there more to learn at some point?

    Also it seems that a lot of content on Resdit is already AI generated, so it would train on data from other LLMs, which I’m sure doesn’t improve quality.