Camera makers and pencil makers (and the users of those devices) aren't making massive server farms that spy on every drop of information they can get ahold of.
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
Now when that's the case, well where did the devs get the training data?.. 🤔
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
That's not how generative AI works. It's capable of creating images that include novel elements that weren't in the training set.
Go ahead and ask one to generate a bonkers image description that doesn't exist in its training data and there's a good chance it'll be able to make one for you. The classic example is an "avocado chair", which an early image generator was able to produce many plausible images of despite only having been trained on images of avocados and chairs. It understood the two general concepts and was able to figure out how to meld them into a common depiction.
Yes, I've tried similar silly things. I've asked AI to render an image of Mr. Bean hugging Pennywise the clown. And it delivered, something randomly silly looking, but still not far off base.
But when it comes to inappropriate material, well the AI shouldn't be able to generate any such thing in the first place, unless the developers have allowed it to train from inappropriate sources..
The trainers didn't train the image generator on images of Mr. Bean hugging Pennywise, and yet it's able to generate images of Mr. Bean hugging Pennywise. Yet you insist that it can't generate inappropriate images without having been specifically trained on inappropriate images? Why is that suddenly different?
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model's capabilities.
Who is responsible then? Cuz the devs basically gotta let the AI go to town on many websites and documents for any sort of training set.
So you mean to say, you can't blame the developers, because they just made a tool (one that scrapes data from everywhere possible), can't blame the tool (don't mind that AI is scraping all your data), and can't blame the end users, because some dirty minded people search or post inappropriate things..?
First, you need to figure out exactly what it is that the "blame" is for.
If the problem is the abuse of children, well, none of that actually happened in this case so there's no blame to begin with.
If the problem is possession of CSAM, then that's on the guy who generated them since they didn't exist at any point before then. The trainers wouldn't have needed to have any of that in the training set so if you want to blame them you're going to need to do a completely separate investigation into that, the ability of the AI to generate images like that doesn't prove anything.
If the problem is the creation of CSAM, then again, it's the guy who generated them.
If it's the provision of general-purpose art tools that were later used to create CSAM, then sure, the AI trainers are in trouble. As are the camera makers and the pencil makers, as I mentioned sarcastically in my first comment.
AI only knows what has gone through it's training data, both from the developers and the end users.
Yes, and as I've said repeatedly, it's able to synthesize novel images from the things it has learned.
If you train an AI with pictures of green cars and pictures of red apples, it'll be able to figure out how to generate images of red cars and green apples for you.
That’d be like outlawing hammers because someone figured out they make a great murder weapon.
Just because you can use a tool for crime, doesn’t mean that tool was designed/intended for crime.
Not exactly. This would be more akin to a company that will 3D printer metal parts and assemble them for you. You use this service and have them create and assemble a gun for you. Then you use that weapon in a violent crime. Should the company have known better that you were having them create an illegal weapon on your behalf?
The person who was charged was using Stable Diffusion to generate the images on their own computer, entirely with their own resources. So it's akin to a company that sells 3D printers selling a printer to someone, who then uses it to build a gun.
Sadly that's what most of the gun laws are designed about. Book banning and anti-abortion both are limiting tools because of what a small minority choose to do with the tool.
AI image generation shouldn't be considered in obscenity laws. His distribution or pornography to minor should be the issue, because not everyone stuck with that disease should be deprived tools that can be used to keep them away from hurting others.
Using AI images to increase charges should be wrong. A pedophile contacting and distributing pornography to children should be all that it takes to charge a person. This will just setup new precedent that is beyond the scope of the judiciary.
It would be more like outlawing ivory grand pianos because they require dead elephants to make - the AI models under question here were trained on abuse.
A person (the arrested software engineer from the article) acquired a tool (a copy of Stable Diffusion, available on github) and used it to commit crime (trained it to generate CSAM + used it to generate CSAM).
That has nothing to do with the developer of the AI, and everything to do with the person using it. (hence the arrest...)
Unfortunately the developer trained it on some CSAM which I think means they're not free of guilt - we really need to rebuild these models from the ground up to be free of that taint.
Given it's public dataset not owned or maintained by the developers of Stable Diffusion; I wouldn't consider that their fault either.
I think it's reasonable to expect a dataset like that should have had screening measures to prevent that kind of data being imported in the first place. It shouldn't be on users (here meaning the devs of Stable Diffusion) of that data to ensure there's no illegal content within the billions of images in a public dataset.
That's a different story now that users have been informed of the content within this particular data, but I don't think it should have been assumed to be their responsibility from the beginning.
Sounds to me it would be more like outlawing grand pianos because of all of the dead elephants - while some people are claiming that it is possible to make a grand piano without killing elephants.
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model's capabilities.
No, I'm not - I still have ethical objections and I don't believe CSAM could be generated without some CSAM in the training set. I think it's generally problematic to sexually fantasize about underage persons though I know that's an extremely unpopular opinion here.
So why are you posting all over this thread about how CSAM was included in the training set if that is in your opinion ultimately irrelevant with regards to the topic of the post and discussion, the morality of using AI to generate CSAM?
Because all over this thread are claims that AI CSAM doesn't need actual CSAM to generate. We currently don't have AI CSAM that is taint free and it's unlikely we ever will due to how generative AI works.
So at best we don't know whether or not AI CSAM without CSAM training data is possible. "This AI used CSAM training data" is not an answer to that question. It is even less of an answer to the question "Should AI generated CSAM be illegal?" Just like "elephants get killed for their ivory" is not an answer to "should pianos be illegal?"
If your argument is that yes, all AI CSAM should be illegal whether or not the training used real CSAM, then argue that point. Whether or not any specific AI used CSAM to train is an irrelevant non sequitur. A lot of what you're doing now is replying to "pencils should not be illegal just because some people write bad stuff" with the equivalent of "this one guy did some bad stuff before writing it down". That is completely unrelated to the argument being made.
That's not the point. You don't train a hammer from millions of user inputs.
You gotta ask, if the AI can produce inappropriate material, then where did the developers get the training data, and what exactly did they train those AI models for?
Do... Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
Or are you arguing that we should be allowed to do what's been done in the article? (arrest and charge the individual responsible for training their copy of an AI model to generate CSAM)
One, AI image generators can and will spit out content vastly different than anything in the training dataset (this ofc can be influenced greatly by user input). This can be fed back into the training data to push the model towards the desired outcome. Examples of the desired outcome are not required at all. (IE you don't have to feed it CSAM to get CSAM, you just have to consistently push it more and more towards that goal)
Two, anyone can host an AI model; it's not reserved for big corporations and their server farms. You can host your own copy and train it however you'd like on whatever material you've got. (that's literally how Stable Diffusion is used) This kind of explicit material is being created by individuals using AI software they've downloaded/purchased/stolen and then trained themselves. They aren't buying a CSAM generator ready to use off the open market... (nor are they getting this material from publicly operating AI models)
They are acquiring a tool and moulding it into a weapon of their own volition.
Some tools you can just use immediately, others have a setup process first. AI is just a tool, like a hammer. It can be used appropriately, or not. The developer isn't responsible for how you decide to use it.
Do... Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model's capabilities.
I think that’s a bit of a stretch. If it was being marketed as “make your fantasy, no matter how illegal it is,” then yeah. But just because I use a tool someone else made doesn’t mean they should be held liable.
And if I prompted AI for something inappropriate, and it gave me a relevant image, then that means the AI had inappropriate material in it's training data.
No, you keep repeating this but it remains untrue no matter how many times you say it. An image generator is able to create novel images that are not directly taken from its training data. That's the whole point of image AIs.
An image generator is able to create novel images that are not directly taken from its training data. That's the whole point of image AIs.
I just want to clarity that you've bought the silicon valley hype for AI but that is very much not the truth. It can create nothing novel - it can merely combine concepts and themes and styles in an incredibly complex manner... but it can never create anything novel.
What it's able and intended to do is besides the point, if it's also capable of generating inappropriate material.
Let me spell it more clearly. AI wouldn't know what a pussy looked like if it was never exposed to that sort of data set. It wouldn't know other inappropriate things if it wasn't exposed to that data set either.
Do you see where I'm going with this? AI only knows what people allow it to learn...
You realize that there are perfectly legal photographs of female genitals out there? I've heard it's actually a rather popular photography subject on the Internet.
Do you see where I'm going with this? AI only knows what people allow it to learn...
Yes, but the point here is that the AI doesn't need to learn from any actually illegal images. You can train it on perfectly legal images of adults in pornographic situations, and also perfectly legal images of children in non-pornographic situations, and then when you ask it to generate child porn it has all the concepts it needs to generate novel images of child porn for you. The fact that it's capable of that does not in any way imply that the trainers fed it child porn in the training set, or had any intention of it being used in that specific way.
As others have analogized in this thread, if you murder someone with a hammer that doesn't make the people who manufactured the hammer guilty of anything. Hammers are perfectly legal. It's how you used it that is illegal.
Yes. You're saying that the AI trainers must have had CSAM in their training data in order to produce an AI that is able to generate CSAM. That's simply not the case.
You also implied earlier on that these AIs "act or respond on their own", which is also not true. They only generate images when prompted to by a user.
The fact that an AI is able to generate inappropriate material just means it's a versatile tool.
Alright, well let's play an innocent hypothetical here.
Let's pretend you only know some magic word model (doesn't exist without thousands or millions of images by the way).
But anyways, let's say you're the AI. Now, with no vision of the world, what would you, as an AI, say if I asked you about how crescent wrenches and channel locks reproduced?
Now try the same hypothetical question again. This time, you actually have a genuine set of images of clean new tools, plus information that tools can't reproduce.
And now let's go to the modern day. Where AI has zillions of images of rusty redneck toolboxes, and a bunch of janky dialogue..
I'm not sure why you're picking this situation for an anti-AI rant. Of course there are a lot of ways that large companies will try to use AI that will harm society. But this is a situation where we already have laws on the books to lock up the people who are specifically doing terrible things. Good.
If you want to try to stand up and tell us about how AI is going to damage society, pick an area where people are using it legally and show us the harms there. Find something that's legal but immoral and unethical, and then you'll get a lot of support.