Or, with AI image gen, it knows that when some one asks it for an image of a hand holding a pencil, it looks at all the artwork in it’s training database and says, “this collection of pixels is probably what they want”.
This is incorrect. Generative image models don’t contain databases of artwork. If they did, they would be the most amazing fucking compression technology, ever.
It’s a general-use model that can generate basically anything you want. It’s not perfect and it’s not the latest & greatest AI image generation model, but it’s a great example because anyone can download it and run it locally on their own PC (and get vastly superior results than ChatGPT’s DALL-E model).
If you examine the data inside the model, you’ll see a bunch of metadata headers and then an enormous array of arrays of floating point values. Stuff like, [0.01645, 0.67235, ...]. That is what a generative image AI model uses to make images. There’s no database to speak of.
When training an image model, you need to download millions upon millions of public images from the Internet and run them through their paces against an actual database like ImageNET. ImageNET contains lots of metadata about millions of images such as their URL, bounding boxes around parts of the image, and keywords associated with those bounding boxes.
The training is mostly a linear process. So the images never really get loaded into an database, they just get read along with their metadata into a GPU where it performs some Machine Learning stuff to generate some arrays of floating point values. Those values ultimately will end up in the model file.
It’s actually a lot more complicated than that (there’s pretraining steps and classifiers and verification/safety stuff and more) but that’s the gist of it.
I see soooo many people who think image AI generation is literally pulling pixels out of existing images but that’s not how it works at all. It’s not even remotely how it works.
When an image model is being trained, any given image might modify one of those floating point values by like ±0.01. That’s it. That’s all it does when it trains on a specific image.
I often rant about where this process goes wrong and how it can result in images that look way too much like some specific images in training data but that’s a flaw, not a feature. It’s something that every image model has to deal with and will improve over time.
At the heart of every AI image generation is a random number generator. Sometimes you’ll get something similar to an original work. Especially if you generate thousands and thousands of images. That doesn’t mean the model itself was engineered to do that. Also: A lot of that kind of problem happens in the inference step but that’s a really complicated topic…
This is incorrect. Generative image models don’t contain databases of artwork. If they did, they would be the most amazing fucking compression technology, ever.
… snip…
The training is mostly a linear process. So the images never really get loaded into an database, they just get read along with their metadata into a GPU where it performs some Machine Learning stuff to generate some arrays of floating point values. Those values ultimately will end up in the model file.
Where does it get read from? a database, right? yeah. that’s called a database. It may not be a large massive repository of art to rival the Vatican’s secret collection, but it is a database of digital art.
as for it being complex… yeah. that’s why I kept it simple and glossed over all the complex stuff that’s not really, you know. relevant to the question of who owns it.
I did stumble on an interesting AI use that seems super legit for creatives:
There’s an AI powered app for a specific brand of guitar amplifier. If you want your guitar to sound like a particular artist or a particular song, you tell it via a natural language input and it does all the adjustments for you.
You STILL have to have the personal talent to, you know, PLAY the guitar, but it saves you hours of fiddling with dials and figuring out what effects and pedals to apply to get the sound you’re looking for.
Video, same player, same guitar, same amp, multiple sounds:
From a purely artistic perspective, this would be like asking AI for a Pantone or RGB palette set for a specific work of art. All it’s doing is telling you the colors so you can avoid doing all the research and mixing yourself.
This is incorrect. Generative image models don’t contain databases of artwork. If they did, they would be the most amazing fucking compression technology, ever.
As an example model, FLUX.dev is 23.8GB:
https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main
It’s a general-use model that can generate basically anything you want. It’s not perfect and it’s not the latest & greatest AI image generation model, but it’s a great example because anyone can download it and run it locally on their own PC (and get vastly superior results than ChatGPT’s DALL-E model).
If you examine the data inside the model, you’ll see a bunch of metadata headers and then an enormous array of arrays of floating point values. Stuff like,
[0.01645, 0.67235, ...]. That is what a generative image AI model uses to make images. There’s no database to speak of.When training an image model, you need to download millions upon millions of public images from the Internet and run them through their paces against an actual database like ImageNET. ImageNET contains lots of metadata about millions of images such as their URL, bounding boxes around parts of the image, and keywords associated with those bounding boxes.
The training is mostly a linear process. So the images never really get loaded into an database, they just get read along with their metadata into a GPU where it performs some Machine Learning stuff to generate some arrays of floating point values. Those values ultimately will end up in the model file.
It’s actually a lot more complicated than that (there’s pretraining steps and classifiers and verification/safety stuff and more) but that’s the gist of it.
I see soooo many people who think image AI generation is literally pulling pixels out of existing images but that’s not how it works at all. It’s not even remotely how it works.
When an image model is being trained, any given image might modify one of those floating point values by like ±0.01. That’s it. That’s all it does when it trains on a specific image.
I often rant about where this process goes wrong and how it can result in images that look way too much like some specific images in training data but that’s a flaw, not a feature. It’s something that every image model has to deal with and will improve over time.
At the heart of every AI image generation is a random number generator. Sometimes you’ll get something similar to an original work. Especially if you generate thousands and thousands of images. That doesn’t mean the model itself was engineered to do that. Also: A lot of that kind of problem happens in the inference step but that’s a really complicated topic…
Where does it get read from? a database, right? yeah. that’s called a database. It may not be a large massive repository of art to rival the Vatican’s secret collection, but it is a database of digital art.
as for it being complex… yeah. that’s why I kept it simple and glossed over all the complex stuff that’s not really, you know. relevant to the question of who owns it.
I did stumble on an interesting AI use that seems super legit for creatives:
There’s an AI powered app for a specific brand of guitar amplifier. If you want your guitar to sound like a particular artist or a particular song, you tell it via a natural language input and it does all the adjustments for you.
You STILL have to have the personal talent to, you know, PLAY the guitar, but it saves you hours of fiddling with dials and figuring out what effects and pedals to apply to get the sound you’re looking for.
Video, same player, same guitar, same amp, multiple sounds:
https://youtube.com/shorts/wsGj4zsfOuQ
From a purely artistic perspective, this would be like asking AI for a Pantone or RGB palette set for a specific work of art. All it’s doing is telling you the colors so you can avoid doing all the research and mixing yourself.
How you USE those colors? That’s on you!