Its worth reading the article rather than trying to answer the headline

  • Riskable@programming.dev
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    9 hours ago

    Or, with AI image gen, it knows that when some one asks it for an image of a hand holding a pencil, it looks at all the artwork in it’s training database and says, “this collection of pixels is probably what they want”.

    This is incorrect. Generative image models don’t contain databases of artwork. If they did, they would be the most amazing fucking compression technology, ever.

    As an example model, FLUX.dev is 23.8GB:

    https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main

    It’s a general-use model that can generate basically anything you want. It’s not perfect and it’s not the latest & greatest AI image generation model, but it’s a great example because anyone can download it and run it locally on their own PC (and get vastly superior results than ChatGPT’s DALL-E model).

    If you examine the data inside the model, you’ll see a bunch of metadata headers and then an enormous array of arrays of floating point values. Stuff like, [0.01645, 0.67235, ...]. That is what a generative image AI model uses to make images. There’s no database to speak of.

    When training an image model, you need to download millions upon millions of public images from the Internet and run them through their paces against an actual database like ImageNET. ImageNET contains lots of metadata about millions of images such as their URL, bounding boxes around parts of the image, and keywords associated with those bounding boxes.

    The training is mostly a linear process. So the images never really get loaded into an database, they just get read along with their metadata into a GPU where it performs some Machine Learning stuff to generate some arrays of floating point values. Those values ultimately will end up in the model file.

    It’s actually a lot more complicated than that (there’s pretraining steps and classifiers and verification/safety stuff and more) but that’s the gist of it.

    I see soooo many people who think image AI generation is literally pulling pixels out of existing images but that’s not how it works at all. It’s not even remotely how it works.

    When an image model is being trained, any given image might modify one of those floating point values by like ±0.01. That’s it. That’s all it does when it trains on a specific image.

    I often rant about where this process goes wrong and how it can result in images that look way too much like some specific images in training data but that’s a flaw, not a feature. It’s something that every image model has to deal with and will improve over time.

    At the heart of every AI image generation is a random number generator. Sometimes you’ll get something similar to an original work. Especially if you generate thousands and thousands of images. That doesn’t mean the model itself was engineered to do that. Also: A lot of that kind of problem happens in the inference step but that’s a really complicated topic…

    • FuglyDuck@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      8 hours ago

      This is incorrect. Generative image models don’t contain databases of artwork. If they did, they would be the most amazing fucking compression technology, ever. … snip… The training is mostly a linear process. So the images never really get loaded into an database, they just get read along with their metadata into a GPU where it performs some Machine Learning stuff to generate some arrays of floating point values. Those values ultimately will end up in the model file.

      Where does it get read from? a database, right? yeah. that’s called a database. It may not be a large massive repository of art to rival the Vatican’s secret collection, but it is a database of digital art.

      as for it being complex… yeah. that’s why I kept it simple and glossed over all the complex stuff that’s not really, you know. relevant to the question of who owns it.

    • jordanlund@lemmy.worldM
      link
      fedilink
      arrow-up
      1
      ·
      8 hours ago

      I did stumble on an interesting AI use that seems super legit for creatives:

      There’s an AI powered app for a specific brand of guitar amplifier. If you want your guitar to sound like a particular artist or a particular song, you tell it via a natural language input and it does all the adjustments for you.

      You STILL have to have the personal talent to, you know, PLAY the guitar, but it saves you hours of fiddling with dials and figuring out what effects and pedals to apply to get the sound you’re looking for.

      Video, same player, same guitar, same amp, multiple sounds:

      https://youtube.com/shorts/wsGj4zsfOuQ

      From a purely artistic perspective, this would be like asking AI for a Pantone or RGB palette set for a specific work of art. All it’s doing is telling you the colors so you can avoid doing all the research and mixing yourself.

      How you USE those colors? That’s on you!