Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT
Always love those answers, well if you read the 700 page white paper on this one command set in one module then you would understand… do you think I have the time to read 37000 pages of bland ass documentation yearly on top of doing my actual job? Come the fuck on.
I guess some of these guys have so many heads on their crews that they don’t have much work to do anymore but that’s not the case for most
This would have been probably early last year? Had to look up how to do something in fortran (because fortran) and the answer was very much in the voice of that one dude on the Intel forums who has been answering every single question for decades(?) at this point. Which means it also refused to do anything with features newer than 1992 and was worthless.
Tried again while chatting with an old work buddy a few months back and it looks like they updated to acknowledging f99 and f03 exist. So assume that was all stack overflow.
Well, it is important to comply with the terms of service established by the website. It is highly recommended to familiarize oneself with the legally binding documents of the platform, including the Terms of Service (Section 2.1), User Agreement (Section 4.2), and Community Guidelines (Section 3.1), which explicitly outline the obligations and restrictions imposed upon users. By refraining from engaging in activities explicitly prohibited within these sections, you will be better positioned to maintain compliance with the platform's rules and regulations and not receive email bans in the future.
It's way easier to figure that out than check ChatGPT hallucinations. There's usually someone saying why a response in SO is wrong, either in another response or a comment. You can filter most of the garbage right at that point, without having to put it in your codebase and discover that the hard way. You get none of that information with ChatGPT. The data spat out is not equivalent.
When you paste that code you do it in your private IDE, in a dev environment and you test it thoroughly before handing it off to the next person to test before it goes to production.
Hitting up ChatPPT for the answer to a question that you then vomit out in a meeting as if it’s knowledge is totally different.
If you use LLMs in your professional work, you're crazy. I would never be comfortably opening myself up to the legal and security liabilities of AI tools.
If you use LLMs in your professional work, you're crazy
Eh, we use copilot at work and it can be pretty helpful. You should always check and understand any code you commit to any project, so if you just blindly paste flawed code (like with stack overflow,) that's kind of on you for not understanding what you're doing.
I feel like it had to cause an actual disaster with assets getting destroyed to become part of common knowledge (like the challenger shuttle or something).
Maybe for people who have no clue how to work with an LLM. They don't have to be perfect to still be incredibly valuable, I make use of them all the time and hallucinations aren't a problem if you use the right tools for the job in the right way.
The last time I saw someone talk about using the right LLM tool for the job, they were describing turning two minutes of writing a simple map/reduce into one minute of reading enough to confirm the generated one worked. I think I'll pass on that.
This. I use LLM for work, primarily to help create extremely complex nested functions.
I don’t count on LLM’s to create anything new for me, or to provide any data points. I provide the logic, and explain exactly what I want in the end.
I take a process which normally takes 45 minutes daily, test it once, and now I have reclaimed 43 extra minutes of my time each day.
It’s easy and safe to test before I apply it to real data.
It’s missed the mark a few times as I learned how to properly work with it, but now I’m consistently getting good results.
Other use cases are up for debate, but I agree when used properly hallucinations are not much of a problem. When I see people complain about them, that tells me they’re using the tool to generate data, which of course is stupid.
If they manage to strip any concept of authenticity, ownership or obligation from the entirety of human output and stick it behind a paywall, that's pretty much the whole ball game.
If we decide later that this is actually a really bullshit deal -- that they get everything for free and then sell it back to us -- then they'll surely get some sort of grandfather clause because "Whoops, we already did it!"
I've tried a lot of scenarios and languages with various LLMs. The biggest takeaway I have is that AI can get you started on something or help you solve some issues. I've generally found that anything beyond a block or two of code becomes useless. The more it generates the more weirdness starts popping up, or it outright hallucinates.
For example, today I used an LLM to help me tighten up an incredibly verbose bit of code. Today was just not my day and I knew there was a cleaner way of doing it, but it just wasn't coming to me. A quick "make this cleaner: <code>" and I was back to the rest of the code.
This is what LLMs are currently good for. They are just another tool like tab completion or code linting
I use it all the time and it's brilliant when you put in the basic effort to learn how to use it effectively.
It's allowing me and other open source devs to increase the scope and speed of our contributions, just talking through problems is invaluable. Greedy selfish people wanting to destroy things that help so many is exactly the rolling coal mentality - fuck everyone else I don't want the world to change around me! Makes me so despondent about the future of humanity.
Have you tried recent models? They're not perfect no, but they can usually get you most of the way there if not all the way. If you know how to structure the problem and prompt, granted.
We already have those near constantly. And we still keep asking queries.
People assume that LLMs need to be ready to replace a principle engineer or a doctor or lawyer with decades of experience.
This is already at the point where we can replace an intern or one of the less good junior engineers. Because anyone who has done code review or has had to do rounds with medical interns know... they are idiots who need people to check their work constantly. An LLM making up some functions because they saw it in stack overflow but never tested is not at all different than a hotshot intern who copied some code from stack overflow and never tested it.
This is already at the point where we can replace an intern or one of the less good junior engineers.
This is a bad thing.
Not just because it will put the people you're talking about out of work in the short term, but because it will prevent the next generation of developers from getting that low-level experience. They're not "idiots", they're inexperienced. They need to get experience. They won't if they're replaced by automation.
So, the whole point of learning is to ask questions from people who know more than you, so that you can gain the knowledge you need to succeed…
So… if you try to use these LLMs to replace parts of sectors, where there need to be people that can work their way to the next tier as they learn more and get better at their respective sectors, you do realize that eventually there will no longer be people that can move up their respective tier/position, because people like you said “Fuck ‘em, all in on this stupid LLM bullshit!” So now there are no more doctors, or real programmers, because people like you thought it would just be the GREATEST idea to replace humans with fucking LLMs.
You do see that, right?
Calling people fucking stupid, because they are learning, is actually pretty fucking stupid.
This is already at the point where we can replace an intern or one of the less good junior engineers. Because anyone who has done code review or has had to do rounds with medical interns know… they are idiots who need people to check their work constantly.
Do so at your own peril. Because the thing is, a person will learn from their mistakes and grow in knowledge and experience over time. An LLM is unlikely to do the same in a professional environment for two big reasons:
The company using the LLM would have to send data back to the creator of the LLM. This means their proprietary work could be at risk. The AI company could scoop them, or a data leak would be disastrous.
Alternatively, the LLM could self-learn and be solely in house without any external data connections. A company with an LLM will never go for this, because it would mean their model is improving and developing out of their control. Their customized version may end up being better than their the LLM company's future releases. Or, something might go terribly wrong with the model while it learns and adapts. If the LLM company isn't held legally liable, they're still going to lose that business going forward.
On top of that, you need your inexperienced noobs to one day become the ones checking the output of an LLM. They can't do that unless they get experience doing the work. Companies already have proprietary models that just require the right inputs and pressing a button. Engineers are still hired though to interpret the results, know what inputs are the right ones, and understand how the model works.
A company that tries replacing them with LLMs is going to lose in the long run to competitors.
You can be killed with steel, which has a lot of other implications on what you do in order to avoid getting killed with steel.
Does steel fuck it all up?
Centralization is a shitty backwards idea. But you have to be very conscious of yourself and your instincts to neuter the part that tells you that it's not to understand it.
Distributivism minus Catholicism is just so good. I always return to it when I give up on trying to find future in some other political ideology.
This has nothing to do with centralization. AI companies are already scraping the web for everything useful. If you took the content from SO and split it into 1000 federated sites, it would still end up in a AI model. Decentralization would only help if we ever manage to hold the AI companies accountable for the en masse copyright violations they base their industry on.
Hmmm, maybe the Catholic part isn't the only part worth reviewing.
Also worth noting that the Conservative Party's 'Big Society' schtick in 2010 was wrapped in the trappings of distributism.
Not that all this diminishes it entirely but it does seem to be an entry drug for exploitation by the right.
I gotta hold my hand up and state that I am not read up on it at all, so happy to be corrected. But my impression is that Pope Leo XIII's conception was to reduce secular power so as to leave a void for the church to fill. And it's the potential exploitation of that void that attracts the far right too.
I don't know. It feels a bit like "When I quit my employer will realize how much they depended on me." The realization tends to be on the other side.
But while SO may keep functioning fine it would be great if this caused other places to spring up as well. Reddit and X/Twitter are still there but I'm glad we have the fediverse.
First, they sent the missionaries. They built communities, facilities for the common good, and spoke of collaboration and mutual prosperity. They got so many of us to buy into their belief system as a result.
Then, they sent the conquistadors. They took what we had built under their guidance, and claimed we "weren't using it" and it was rightfully theirs to begin with.
They are also retained by anyone who has archived them., like OpenAI or Google. Thus making their AIs more valuable.
To really pull up the ladder, they will have to protest the Internet Archive and Common Crawl, too. It's just typical right-wing bullshit; acting on emotion and against their own interests.
Stack Overflow was great when it appeared. The info was spread out incredibly wide and there was a lot of really shitty info as well. One place where it could accumulate and be rated was extremely helpful.
But maybe it's time to create a federated open source Stack Overflow.
I once managed to find a pretty good alternative, but then I forgot its name. It was a very chill community unlike what Stackoverflow was recently with it's toxicity (properly formatted question police, people being offended for less popular languages, etc.).
At the end of the day, this is just yet another example of how capitalism is an extractive system. Unprotected resources are used not for the benefit of all but to increase and entrench the imbalance of assets. This is why they are so keen on DRM and copyright and why they destroy the environment and social cohesion. The thing is, people want to help each other; not for profit but because we have a natural and healthy imperative to do the most good.
There is a difference between giving someone a present and then them giving it to another person, and giving someone a present and then them selling it. One is kind and helpful and the other is disgusting and produces inequality.
If you're gonna use something for free then make the product of it free too.
An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don't mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.
An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don’t mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.
This seems like a very fair and reasonable way to deal with the issue.
Agreed on that last part, making that the default would be a great solution. I could also use a signature in comments, like that guy who always puts the "Commercial AI thingy" but automatically.
I think you still have to have an account (last time I used it anyway), but you're right, there is a tier you don't have to pay any money for. It's just an email address but whatever. You can use it via their website but afaik they haven't released a free model based on the data they've scraped off us, so you can't host it on your own hardware and properly do what you want with it. I have heard though that commercial websites were/are using ChatGPT bots for customer service and you can easily use the customer service chatbots on their website to do other random stuff like writing bash scripts or making yo mama jokes.
Messages that people post on Stack Exchange sites are literally licensed CC-BY-SA, the whole point of which is to enable them to be shared and used by anyone for any purpose. One of the purposes of such a license is to make sure knowledge is preserved by allowing everyone to make and share copies.
That license would require chatgpt to provide attribution every time it used training data of anyone there and also would require every output using that training data to be placed under the same license. This would actually legally prevent anything chatgpt created even in part using this training data from being closed source. Assuming they obviously aren't planning on doing that this is massively shitting on the concept of licensing.
CC attribution doesn't require you to necessarily have the credits immediately with the content, but it would result in one of the world's longest web pages as it would need to have the name of the poster and a link to every single comment they used as training data, and stack overflow has roughly 60 million questions and answers combined.
Maybe but I don’t think that is well tested legally yet. For instance, I’ve learned things from there, but when I share some knowledge I don’t attribute it to all the underlying sources of my knowledge. If, on the other hand, I shared a quote or copypasta from there I’d be compelled to do so I suppose.
I’m just not sure how neural networks will be treated in this regard. I assume they’ll conveniently claim that they can’t tie answers directly to underpinning training data.
It does help to know what those funny letters mean. Now we wait for regulators to catch up..
/tangent
If anything, we're a very long way from anything close to intelligent, OpenAI (and subsequently MS, being publicly traded) sold investors on the pretense that LLMs are close to being "AGI" and now more and more data is necessary to achieving that.
If you know the internet, you know there's a lot of garbage. I for one can't wait for garbage-in garbage-out to start taking its toll.
Also I'm surprised how well open source models have shaped up, its certainly worth a look. I occasionally use a local model for "brainstorming" in the loosest terms, as I generally know what I'm expecting, but it's sometimes helpful to read tasks laid out. Also comfort in that nothing even need leave my network, and even in a pinch I got some answers when my network was offline.
It gives a little hope while corps get to blatantly violate copyright while having wielding it so heavily, that advancements have been so great in open source.
You really don't need anything near as complex as AI...a simple script could be configured to automatically close the issue as solved with a link to a randomly-selected unrelated issue.
The enshittification is very real and is spreading constantly. Companies will leech more from their employees and users until things start to break down. Acceleration is the only way.
I mean, sure but in the context of individual websites I don't see it being a big deal. There will be replacements, and relatively quickly. Accelerationism applied to major societal structures is a terrible idea though.
Except it's not like a plane because we can stop using specific websites whenever we like, and build our own websites to whittle away at their hegemony.
I despise this use of mod power in response to a protest. It's our content to be sabotaged if we want - if Stack Overlords disagree then to hell with them.
I'll add Stack Overflow to my personal ban list, just below Reddit.
Once submitted to stack overflow/Reddit/literally every platform, it's no longer your content. It sucks, but you've implicitly agreed to it when creating your account.
While true, it's stupid that things are that way. They shouldn't be able to hide behind the idea that "we're not responsible for what our users publish, we're more like a public forum" while also having total ownership over that content.
you’ve implicitly agreed to it when creating your account
Many people would agree with that, probably most laws do. However I doubt many users have actually bothered to read the unnecessarily long document, fewer have understood the legalese, and the terms have likely already been changed ~pray I don't alter it any further~. That's a low and shady bar of consent. It indeed sucks and I think people should leave those platforms, but I'm also open to laws that would invalidate that part of the EULA.
At this point I’m assuming most if not all of these content deals are essentially retroactive. They already scrapped the content and found it useful enough to try and secure future use, or at least exclude competitors.
Honestly? I'm down with that. And when the LLM's end up pricing themselves out of usefulness, we'll still have the fediverse version. Having free sites on the net with solid crowd-sourced information is never a bad thing even if other people pick up the data and use it.
It's when private sites like Duolingo and Reddit crowd source the information and then slowly crank down the free aspect that we have the problems.
Assuming the federated version allowed contributor-chosen licenses (similar to GitHub), any harvesting in violation of the license would be subject to legal action.
Contrast that with Stack Exchange, where I assume the terms dictated by Stack Exchange deprive contributors of recourse.
SO already was. Not even harvested as much as handed to them. Periodic data dumps and a general forced commitment to open information were a big part of the reason they won out over other sites that used to compete with them. SO most likely wouldn't have existed if Experts Exchange didn't paywall their entire site.
As with everything else, AI companies believe their training data operates under fair use, so they will discard the CC-SA-4.0 license requirements regardless of whether this deal exists. (And if a court ever finds it's not fair use, they are so many layers of fucked that this situation won't even register.)
Smells too much like duo-lingo. Here, everyone jump in and answers all the questions. 5 years later, ohh look at this gold mine of community data we own....
Yeah but didn't you see the sovereign citizens who think licenses are magic posting giant copyright notices after their posts? Lol
It's so childish, ai tools will help billions of the poorest people access life saving knowledge and services, help open source devs like myself create tools that free people from the clutches of capitalism, but they like living in a world of inequity because their generational wealth earned from centuries of exploitation of the impoverished allows them a better education, better healthcare, and better living standards than the billions of impoverished people on the planet so they'll fight to maintain their privilege even if they're fighting against their own life getting better too. The most pathetic thing is they pretend to be fighting a moral crusade, as if using the answers they freely posted and never expected anything in return for is a real injustice!
And yes I know people are going to pretend that they think tech bros won't allow poor people to use their tech and they base this on assuming how everything always works will suddenly just flip Into reverse at some point or something? Like how mobile phones are only for rich people and only rich people can sell via the internet and only rich people can start a YouTube channel...
And while it hurts now, it's REALLY going to hurt when large swaths of useful answers that don't exist anywhere else are gone and there's nothing replacing them.
Noone writes hundreds of pages of documentation for their stuff anymore. Without the collected knowledge learned from experience there, what do we have?
Unless we have source code to read, very little.
I'm still feeling the pain of google search results sucking combined with most of the large coding forums being gone and reddit slowly going to garbage.
Stack Overflow was the last bastion of collected knowledge of it's type... and it's not like it was 25 years ago where we still had phonebook-sized manuals for almost all major software because agile has killed the concept of exhaustive definitive documentation for a given version of something.
I used to sorta roll my eyes at people shouting about federating everything, but at this point I'm scared and agreeing with them.
Vandalism is always reverted on SO, even if done by the original author. No knowledge is lost. Suing OA for violating the CC-BY license might be possible, but I'd wager SO is not interested in suing them, and since they hold the rights, not much can be done by others.
It should stay for creative works but that's it. It should protect people who actually write books, compose music, make art, and sing. It shouldn't be held by corporations forever by leeching off their workers.
Why does OpenAI want 10 year old answers about using jQuery whenever anyone posts a JavaScript question, followed by aggressive policing of what is and isn't acceptable to re-ask as technology moves on?
A malicious response by users would be to employ an LLM instructed to write plausibly sounding but very wrong answers to historical and current questions, then an army of users upvoting the known wrong answer while downvoting accurate ones. This would poison the data I would think.
All use of generative AI (e.g., ChatGPT1 and other LLMs) is banned when posting content on Stack Overflow.
This includes "asking" the question to an AI generator then copy-pasting its output as well as using an AI generator to "reword" your answers.
Interestingly I see nothing in that policy that would dis-allow machine generated downvotes on proper answers and machine generated upvotes on incorrect ones. So even if LLMs are banned from posting questions or comments, looks like Stackoverflow is perfectly fine with bots voting.
Is there an actual way to stop it? I don't think so. At least, moving to the fediverse would stop any particular corporation from having the monopoly of it, prevent reddit-like abuse of power, would give users more power, among a few other things.
That is how it started. It was a non-profit with the goal to release all their patents and research for free.
That lasted for a few years, and then the people running it realized they could instead all become filthy rich and nobody could do anything about it. So they did that.
But don't worry, they are a capped for-profit now! They can only make 100 time the amount of money as they have investments. So they'll stop when they have reached ... checks notes.... Around $1.3 trillion.
I will answer some questions with my old account using gpt 4 to poison the data.
If you want to poison SO a little at the same time providing valid answers that help users, use outlook.com email domain for new accounts. It seems to not have anti throwaway countermeasures while being accepted by SO. And it seems fitting to bash the corporate with the corporate.
If we can't delete our questions and answers, can we poison the well by uploading masses of shitty questions and answers? If they like AI we could have it help us generate them.
Poison the well by using AI-generated comments and answers. There isn't currently a way to reliably determine if content is human or AI-generated, and training AI on AI is the equivalent of inbreeding.
You are literally the same mentality as the coal rollers
Tech that could improve life for everyone and instead of using it to make open source software or coding solutions to problems you attack it like a crab in a bucket simply because you fear change.
Instead of solely deleting content, what if authors had instead moved their content/answers to something self-owned? Can SO even claim ownership legally of the content on their site? Seems iffy in my own, ignorant take.
Regardless of the license (apart perhaps from public domain) it is legally still your copyright, since you produced the content. Pretty sure in EU they cannot prevent you from deleting your content.
So does that mean anyone is allowed to use said content for whatever purposes they'd like? That'd include AI stuff too I think? Interesting twist there, hadn't thought about it like this yet. Essentially posters would be agreeing to share that data/info publically. No different than someone learning how to code from looking at examples made by their professors or someone else doing the teaching/talking I suppose. Hmm.
Well I suppose in that case, protesting via removal is fine IMO. I think the constructive, next-step would be to create a site where you, the user, own what you post. Does Reddit claim ownership over posts? I wonder what lemmy's "policies" are and if this would be a good grounds (here) to start building something better than what SO was doing.
For years, the site had a standing policy that prevented the use of generative AI in writing or rewording any questions or answers posted. Moderators were allowed and encouraged to use AI-detection software when reviewing posts. Beginning last week, however, the company began a rapid about-face in its public policy towards AI.
I listened to an episode of The Daily on AI, and the stuff they fed into to engines included the entire Internet. They literally ran out of things to feed it. That's why YouTube created their auto-generated subtitles - literally, so that they would have more material to feed into their LLMs. I fully expect reddit to be bought out/merged within the next six months or so. They are desperate for more material to feed the machine. Everything is going to end up going to an LLM somewhere.
I think auto generated subtitles were to fulfil a FCC requirement, some years ago, for content subtitling. It has however turned out super useful for LLM feeding.
It’s true that it’s mostly a symbolic act, but the rebellion matters, especially from old accounts. It’s also a nice way to mark the time after which I never participated in SO again. After my ban expires, I’ll deface my questions again. And again. Until they permaban me.
There’s also the possibility of adding to the wonderful irony of making the AI more useful than the original by having content that’s no longer accessible through through the original. It doesn’t get more enshittified than that, even if Prashanth Chandrasekar is too out of touch to ever regret his decision.
I think you're 100% correct in assuming they've already fed it data scraped from SO. I've previously gotten code samples from ChatGPT that was clearly from SO down to the comments in the code. Even reverse searched some of the code and found the question it was from.
They seem to only be watching the questions right now. You’re automatically prevented from deleting an accepted answer, but if you answered your own question (maybe because SO was useless for certain niche questions a decade ago so you kept digging and found your own solution), you can unaccept your answer first and then delete it.
I got a 30 day ban for “defacing” a few of my 10+ year old questions after moderators promptly reverted the edits. But they seem to have missed where I unaccepted and deleted my answers, even as they hang out in an undeletable state (showing up red for me and hidden for others).
And comments, which are a key part to properly understanding a lot of almost-correct answers, don’t seem to be afforded revision history or to have deletes noticed by moderators.
So it seems like you can still delete a bunch of your content, just not the questions. Do with that what you will.
If you have low karma, then edits are reviewed by multiple people before the edit is saved. That's primarily in place to prevent spam, who could otherwise post a valid question then edit it a few months later transforming the message into a link to some shitty website.
Even with high karma, that just means your edit is temporarily trusted. It's gets reviewed and will be reverted if it's a bad edit.
And any time an edit is reverted, that's a knock against your karma. There's a community enforced requirement for all edits to be a measurable improvement.
Even moderation decisions are reviewed by multiple people - so if someone rejects a post because it's spam, when they should have rejected it because it's off topic (or approved it) then that is also going to be caught and undone. And any harmful contribution (edit or moderation decision) will result in your action being undone and your karma going down. If your karma goes down too fast, your access to the site is revoked. If you do something really bad, then they'll ban your IP address.
Moderators can also lock a controversial post, so only people with high karma can touch it at all.
... keep in mind Stack Overflow doesn't just allow editing your own posts, you can edit any content on the website, similar to wikipedia.
It's honestly a good overall approach, but around when Jeff Attwood left in 2008 it started drifting off course towards the shit show that is stack overflow today.
Maybe a better act of rebellion would be to scrape the data on stack, self host it, and move to an open source platform. Easy for me to say though, when I only ever coded Hello World
Maybe we should start asking questions that iterate loops billions of times. Something semi-malicious that a person would recognize but an AI wouldn't.
Nah, the training data probably doesn't quite work that way. The AI would be very unlikely to test code, just regurgitate the most likely response based on it's training sets. Instead just filling posts with random bits and pieces of unrelated code and responses might be better.
This sort of thing is so self-sabotaging. The website already has your comment, and a license to use it. By deleting your stuff from the web you only ensure that the AI is definitely going to be the better resource to go to for answers.
Not when you've agreed to a terms of service that hands over ownership of your content to Stack Overflow, leaving you merely licensed to use your own content.
Also backups and deleted flags. Whatever comment you submitted is likely backed up already and even if you click the delete button you're likely only just changing a flag.
Frankly, the solution here isn’t vandalism, it’s setting up a competing side and copying the content over. The license of stackoverflow makes that explicitly legal. Anything else is just playing around and hoping that a company acts against its own interests, which has rarely ever worked before.
I’m not saying vandalism is illegal. I’m say that it borders on immoral and that there is a better, more radical (and thus effective) alternative that one might expect to be illegal but in fact isn’t.
It will not make a difference. The internet is free and open by design. You can always scrape the internet any time. A partnership will do nothing but make it a little bit more convenient for them.
Anyone care to explain why people would care that they posted to a public forum that they don't own, with content that is now further being shared for public benefit?
The argument that it's your content becomes false as soon as you shared it with the world.
I can only really speak to reddit, but I think this applies to all of the user generated content websites. The original premise, that everyone agreed to, was the site provides a space and some tools and users provide content to fill it. As information gets added, it becomes a valuable resource for everyone. Ads and other revenue streams become a necessary evil in all this, but overall directly support the core use case.
Now that content is being packaged into large language models to be either put behind a paywall or packed into other non-freely available services. Since they no longer seem interested in supporting the model we all agreed on, I see no reason to continue adding value and since they provided tools to remove content I may as well use them.
But from the very beginning years ago, it was understood that when you post on these types of sites, the data is not yours, or at least you give them license to use it how they see fit. So for years people accepted that, but are now whining because they aren't getting paid for something they gave away.
It's not shared for public benefit, though. OpenAI, despite the Open in their name, charges for access to their models. You either pay with money or (meta)data, depending on the model.
Legally, sure. You signed away your rights to your answers when you joined the forum. Morally, though?
People are pissed that SO, that was actively encouraging Mods to use AI detection software to prevent any LLM usage in the posted questions and answers, are now selling the publicly accessible data, made by their users for free, to a closed-source for-profit entity that refuses to open itself up.
Why?? Please make this make sense. Having AI to help with coding is ideal and the greatest immediate use case probably. The web is an open resource. Why die on this stupid hill instead of advocating for a privacy argument that actually matters?
Edit: Okay got it. Hinder significant human progress because a company I don't like might make some more money from something I said in public, which has been a thing literally forever. You guys really lack a lot of life skills about how the world really works huh?
Because being able to delete your data from social networks you no longer wish to participate in or that have banned you, as long as they specifically haven't paid you for the your contributions, is a privacy argument that actually matters, regardless and independent of AI.
In regards to AI, the problem is not with AI in general but with proprietary for-profit AI getting trained with open resources, even those with underlying license agreements that prevent that information being monetized.
Because none of the big companies listen to the privacy argument. Or any argument, really.
AI in itself is good, amazing, even.
I have no issue with open-source, ideally GPL- or similarly licensed AI models trained on Internet data.
But involuntarily participating in training closed-source corporate AI's...no, thanks. That shit should go to the hellhole it was born in, and we should do our best to destroy it, not advocate for it.
If you care about the future of AI, OpenAI should long be on your enemy list. They expropriated an open model, they were hypocritical enough to keep "open" in the name, and then they essentially sold themselves to Microsoft. That's not the AI future we should want.
Were in a capitalist system and these are for-profit companies, right? What do you think their goal is. It isn't to help you. It's to increase profits. That will probably lead to massive amounts of jobs replaced with AI and we will get nothing for giving them the data to train on. It's purely parasitic. You should not advocate for it.
If it's open and not-for-profit, it can maybe do good, but there's no way this will.
Meta and Google have done more for open source ai than anyone else, I think a lot of antis don't really understand how computer science works so you imagine it's like them collecting up physical iron and taking it into a secret room never to be seen again.
The actual tools and math is what's important, research on best methods is complex and slow but so far all these developments are being written up in papers which anyone can use to learn from - if people on the left weren't so performative and lazy we could have our own ai too
humanity progress is spending cities worth of electricity and water to ask copilot how to use a library and have it lie back to you in natural language? please make this make sense
Why do people roll coal? Why do vandalize electric car chargers? Why do people tie ropes across bike lanes?
Because a changing world is scary and people lash out at new things.
The coal rollers think they're fighting a vallient fight against evil corporations too, they invested their effort into being a car guy and it doesn't feel fair that things are changing so they want to hurt people benefitting from the new tech.
The deeper I get into this platform the more I realize the guise of being 'progressive, left, privacy-conscious, tech inclined' is literally the opposite.
Hating on everything AI is trendy nowdays. Most of these people can't give you any coherent explanation for why. They just adopt the attitude of people around them who also don't know why.
I believe the general reasoning is something along the lines of not wanting bad corporations to profit from their content for free. So it's just a matter of principle for the most part. Perhaps we need to wait for someone to train LLM on the freely available to everyone data on Lemmy and then we can interview it to see what's up.
Mega co operations like Microsoft, Google are evil. Very easy explanation. Even if it was a good open source company scraping the data to train ai models, people should be free to delete the datta they input. It's pretty simple to understand.