Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

If this is true, then we should prepare to be shout at by chatgpt why we didnt knew already that simple error.

ChatGPT now just says “read the docs!” To every question
- Hey ChatGPT, how can I ...
  
  "Locking as this is a duplicate of [unrelated question]"
- And then links to a similar sounding but ultimately totally unrelated site.
- Nah, it just marks your question as duplicate.
- Already had that happen with perplexity, like, no mate, I’m asking you.
- Always love those answers, well if you read the 700 page white paper on this one command set in one module then you would understand… do you think I have the time to read 37000 pages of bland ass documentation yearly on top of doing my actual job? Come the fuck on.
  
  I guess some of these guys have so many heads on their crews that they don’t have much work to do anymore but that’s not the case for most
- Honestly, that wouldn't be the worst thing in the world.
You joke.

This would have been probably early last year? Had to look up how to do something in fortran (because fortran) and the answer was very much in the voice of that one dude on the Intel forums who has been answering every single question for decades(?) at this point. Which means it also refused to do anything with features newer than 1992 and was worthless.

Tried again while chatting with an old work buddy a few months back and it looks like they updated to acknowledging f99 and f03 exist. So assume that was all stack overflow.
This message brought to you by chatgpt bot.

I got an email ban.

1609 hours logged 431 solved threads

Well, it is important to comply with the terms of service established by the website. It is highly recommended to familiarize oneself with the legally binding documents of the platform, including the Terms of Service (Section 2.1), User Agreement (Section 4.2), and Community Guidelines (Section 3.1), which explicitly outline the obligations and restrictions imposed upon users. By refraining from engaging in activities explicitly prohibited within these sections, you will be better positioned to maintain compliance with the platform's rules and regulations and not receive email bans in the future.
- Is this a joke?
- Shit like this makes me so glad that I just don’t sign up for these things if I don’t have to.
  
  30 page TOS? You know what, I don’t need to make an account that bad.
- ITT: People unable to recognize a joke

Take all you want, it will only take a few hallucinations before no one trusts LLMs to write code or give advice

[…]will only take a few hallucinations before no one trusts LLMs to write code or give advice

Because none of us have ever blindly pasted some code we got off google and crossed our fingers ;-)
- It's way easier to figure that out than check ChatGPT hallucinations. There's usually someone saying why a response in SO is wrong, either in another response or a comment. You can filter most of the garbage right at that point, without having to put it in your codebase and discover that the hard way. You get none of that information with ChatGPT. The data spat out is not equivalent.
- When you paste that code you do it in your private IDE, in a dev environment and you test it thoroughly before handing it off to the next person to test before it goes to production.
  
  Hitting up ChatPPT for the answer to a question that you then vomit out in a meeting as if it’s knowledge is totally different.
- Split segment of data without pii to staging database, test pasted script, completely rewrite script over the next three hours.
We should already be at that point. We have already seen LLMs' potential to inadvertently backdoor your code and to inadvertently help you violate copyright law (I guess we do need to wait to see what the courts rule, but I'll be rooting for the open-source authors).

If you use LLMs in your professional work, you're crazy. I would never be comfortably opening myself up to the legal and security liabilities of AI tools.
- If you use LLMs in your professional work, you're crazy
  
  Eh, we use copilot at work and it can be pretty helpful. You should always check and understand any code you commit to any project, so if you just blindly paste flawed code (like with stack overflow,) that's kind of on you for not understanding what you're doing.
- I feel like it had to cause an actual disaster with assets getting destroyed to become part of common knowledge (like the challenger shuttle or something).
- Yeah but if you're not feeding it protected code and just asking simple questions for libraries etc then it's good
Maybe for people who have no clue how to work with an LLM. They don't have to be perfect to still be incredibly valuable, I make use of them all the time and hallucinations aren't a problem if you use the right tools for the job in the right way.
- The last time I saw someone talk about using the right LLM tool for the job, they were describing turning two minutes of writing a simple map/reduce into one minute of reading enough to confirm the generated one worked. I think I'll pass on that.
- This. I use LLM for work, primarily to help create extremely complex nested functions.
  
  I don’t count on LLM’s to create anything new for me, or to provide any data points. I provide the logic, and explain exactly what I want in the end.
  
  I take a process which normally takes 45 minutes daily, test it once, and now I have reclaimed 43 extra minutes of my time each day.
  
  It’s easy and safe to test before I apply it to real data.
  
  It’s missed the mark a few times as I learned how to properly work with it, but now I’m consistently getting good results.
  
  Other use cases are up for debate, but I agree when used properly hallucinations are not much of a problem. When I see people complain about them, that tells me they’re using the tool to generate data, which of course is stupid.
The quality really doesn't matter.

If they manage to strip any concept of authenticity, ownership or obligation from the entirety of human output and stick it behind a paywall, that's pretty much the whole ball game.

If we decide later that this is actually a really bullshit deal -- that they get everything for free and then sell it back to us -- then they'll surely get some sort of grandfather clause because "Whoops, we already did it!"
People keep saying this but it’s just wrong.

Maybe I haven’t tried the language you have but it’s pretty damn good at code.

Granted, whatever it puts out needs to be tested and possibly edited but that’s the same thing we had to do with Stack Overflow answers.
- I've tried a lot of scenarios and languages with various LLMs. The biggest takeaway I have is that AI can get you started on something or help you solve some issues. I've generally found that anything beyond a block or two of code becomes useless. The more it generates the more weirdness starts popping up, or it outright hallucinates.
  
  For example, today I used an LLM to help me tighten up an incredibly verbose bit of code. Today was just not my day and I knew there was a cleaner way of doing it, but it just wasn't coming to me. A quick "make this cleaner: <code>" and I was back to the rest of the code.
  
  This is what LLMs are currently good for. They are just another tool like tab completion or code linting
- I use it all the time and it's brilliant when you put in the basic effort to learn how to use it effectively.
  
  It's allowing me and other open source devs to increase the scope and speed of our contributions, just talking through problems is invaluable. Greedy selfish people wanting to destroy things that help so many is exactly the rolling coal mentality - fuck everyone else I don't want the world to change around me! Makes me so despondent about the future of humanity.
Have you tried recent models? They're not perfect no, but they can usually get you most of the way there if not all the way. If you know how to structure the problem and prompt, granted.
We already have those near constantly. And we still keep asking queries.

People assume that LLMs need to be ready to replace a principle engineer or a doctor or lawyer with decades of experience.

This is already at the point where we can replace an intern or one of the less good junior engineers. Because anyone who has done code review or has had to do rounds with medical interns know... they are idiots who need people to check their work constantly. An LLM making up some functions because they saw it in stack overflow but never tested is not at all different than a hotshot intern who copied some code from stack overflow and never tested it.

Except one costs a lot less...
- This is already at the point where we can replace an intern or one of the less good junior engineers.
  
  This is a bad thing.
  
  Not just because it will put the people you're talking about out of work in the short term, but because it will prevent the next generation of developers from getting that low-level experience. They're not "idiots", they're inexperienced. They need to get experience. They won't if they're replaced by automation.
- So, the whole point of learning is to ask questions from people who know more than you, so that you can gain the knowledge you need to succeed…
  
  So… if you try to use these LLMs to replace parts of sectors, where there need to be people that can work their way to the next tier as they learn more and get better at their respective sectors, you do realize that eventually there will no longer be people that can move up their respective tier/position, because people like you said “Fuck ‘em, all in on this stupid LLM bullshit!” So now there are no more doctors, or real programmers, because people like you thought it would just be the GREATEST idea to replace humans with fucking LLMs.
  
  You do see that, right?
  
  Calling people fucking stupid, because they are learning, is actually pretty fucking stupid.
- This is already at the point where we can replace an intern or one of the less good junior engineers. Because anyone who has done code review or has had to do rounds with medical interns know… they are idiots who need people to check their work constantly.
  
  Do so at your own peril. Because the thing is, a person will learn from their mistakes and grow in knowledge and experience over time. An LLM is unlikely to do the same in a professional environment for two big reasons:
  
  The company using the LLM would have to send data back to the creator of the LLM. This means their proprietary work could be at risk. The AI company could scoop them, or a data leak would be disastrous.
  
  Alternatively, the LLM could self-learn and be solely in house without any external data connections. A company with an LLM will never go for this, because it would mean their model is improving and developing out of their control. Their customized version may end up being better than their the LLM company's future releases. Or, something might go terribly wrong with the model while it learns and adapts. If the LLM company isn't held legally liable, they're still going to lose that business going forward.
  
  On top of that, you need your inexperienced noobs to one day become the ones checking the output of an LLM. They can't do that unless they get experience doing the work. Companies already have proprietary models that just require the right inputs and pressing a button. Engineers are still hired though to interpret the results, know what inputs are the right ones, and understand how the model works.
  
  A company that tries replacing them with LLMs is going to lose in the long run to competitors.

See, this is why we can't have nice things. Money fucks it up, every time. Fuck money, it's a shitty backwards idea. We can do better than this.

Hear me out. Bottle caps.
- 'Nuff said!
- Nah, I can't imagine the Fallout that would cause
Someone comes up with something good: look what I made, we can use this to better humanity!

Corporations: How can we make money off of this?
You can be killed with steel, which has a lot of other implications on what you do in order to avoid getting killed with steel.

Does steel fuck it all up?

Centralization is a shitty backwards idea. But you have to be very conscious of yourself and your instincts to neuter the part that tells you that it's not to understand it.

Distributivism minus Catholicism is just so good. I always return to it when I give up on trying to find future in some other political ideology.
- This has nothing to do with centralization. AI companies are already scraping the web for everything useful. If you took the content from SO and split it into 1000 federated sites, it would still end up in a AI model. Decentralization would only help if we ever manage to hold the AI companies accountable for the en masse copyright violations they base their industry on.
- Anarchosyndicalism ftw.
- List of Distributist parties in the UK:
  
  National Distributist Party
  
  British National Party
  
  National Front
  
  Hmmm, maybe the Catholic part isn't the only part worth reviewing.
  
  Also worth noting that the Conservative Party's 'Big Society' schtick in 2010 was wrapped in the trappings of distributism.
  
  Not that all this diminishes it entirely but it does seem to be an entry drug for exploitation by the right.
  
  I gotta hold my hand up and state that I am not read up on it at all, so happy to be corrected. But my impression is that Pope Leo XIII's conception was to reduce secular power so as to leave a void for the church to fill. And it's the potential exploitation of that void that attracts the far right too.

So they pulled a "reddit"?

These companies don't realise their most engaged users generate a disproportionate amount of their content.

They will just go to their own spaces.

I think this a good thing in the long run, the internet will become decentralised again.
- Well, reddit is doing fine so far. Shareholders are happy
- I don't know. It feels a bit like "When I quit my employer will realize how much they depended on me." The realization tends to be on the other side.
  
  But while SO may keep functioning fine it would be great if this caused other places to spring up as well. Reddit and X/Twitter are still there but I'm glad we have the fediverse.
- CEO will have his bag and be gone by then.
And then Stack Overflow will go the same way Digg did.
- god damn- I went over to Digg yesterday to see what its been like and I shit you not, it is links to reddit threads and instagram posts
I hope it doesn't end up like it did on Reddit, where all those protests did not result in anything at all.
- Lemmy's bigger than ever, and that's a direct consequence of reddit's enshittification, so there's that at least.

Reddit/Stack/AI are the latest examples of an economic system where a few people monetize and get wealthy using the output of the very many.

Technofeudalism
- It's very precisely that.
Mmm this golden goose tastes delicious!
https://blog.codinghorror.com/are-you-a-digital-sharecropper/

Interesting article from on of the co-founders of StackOverflow.
You're forgetting a silly and funny company whose name starts with "G"

First, they sent the missionaries. They built communities, facilities for the common good, and spoke of collaboration and mutual prosperity. They got so many of us to buy into their belief system as a result.

Then, they sent the conquistadors. They took what we had built under their guidance, and claimed we "weren't using it" and it was rightfully theirs to begin with.

digging their own grave

How many trees does a person need to make one coffin...
- It's a metaphor for us killing ourselves in the processes of deforestation, not a story of someone actually making a coffin.
- I counted around 30-32 in panel 2.

Oh I didn't consider deleting my answers. Thanks for the good idea ~~Barbra~~ StackOverflow.

I'd be shocked if deleted comments weren't retained by them
- I think the reason for those bans is that they don't want you rebelling and are showing that they don't need you personally, thus ban.
  
  Of course it's all retained.
- They have been un-deleting after they ban.
- Isn't that illegal in most countries?
- They are also retained by anyone who has archived them., like OpenAI or Google. Thus making their AIs more valuable.
  
  To really pull up the ladder, they will have to protest the Internet Archive and Common Crawl, too. It's just typical right-wing bullshit; acting on emotion and against their own interests.

Letting corporations "disrupt" forums was a mistake.

Stack Overflow was great when it appeared. The info was spread out incredibly wide and there was a lot of really shitty info as well. One place where it could accumulate and be rated was extremely helpful.

But maybe it's time to create a federated open source Stack Overflow.
- They also committed to providing open dumps of their data to make it free to all. At the start, they were doing all the right things.
- I once managed to find a pretty good alternative, but then I forgot its name. It was a very chill community unlike what Stackoverflow was recently with it's toxicity (properly formatted question police, people being offended for less popular languages, etc.).

Maybe we should replace Stack Overflow with another site where experts can exchange information? We can call it "Experts Exchange".

Expert Sex Change?
- Yes, next to Pen Island
- Also a market there. Especially among programmers. You might be onto something.
- You don't want that shit done by an amateur
codidact ... Stack overflow had a mass exodus of mods a 2-3 years ago and a some of them made codidact.
- Any discussion on making it ActivityPub enabled?
  
  I didn't see any, but would be curious if anyone else had.
Lemmy could be used as a stack overflow alt also Lemmy is shitification repelent by design .
Maybe there I can ask where to find a good pen supplier.

At the end of the day, this is just yet another example of how capitalism is an extractive system. Unprotected resources are used not for the benefit of all but to increase and entrench the imbalance of assets. This is why they are so keen on DRM and copyright and why they destroy the environment and social cohesion. The thing is, people want to help each other; not for profit but because we have a natural and healthy imperative to do the most good.

There is a difference between giving someone a present and then them giving it to another person, and giving someone a present and then them selling it. One is kind and helpful and the other is disgusting and produces inequality.

If you're gonna use something for free then make the product of it free too.

An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don't mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.

An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don’t mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.

This seems like a very fair and reasonable way to deal with the issue.
Agreed on that last part, making that the default would be a great solution. I could also use a signature in comments, like that guy who always puts the "Commercial AI thingy" but automatically.
Well, supposedly people can use it without paying and without account, though I cannot confirm the last part in the official site.
- Open access != Copyleft, but its a decent start.
- I think you still have to have an account (last time I used it anyway), but you're right, there is a tier you don't have to pay any money for. It's just an email address but whatever. You can use it via their website but afaik they haven't released a free model based on the data they've scraped off us, so you can't host it on your own hardware and properly do what you want with it. I have heard though that commercial websites were/are using ChatGPT bots for customer service and you can easily use the customer service chatbots on their website to do other random stuff like writing bash scripts or making yo mama jokes.

Begun, the AI wars have.

Faces on T-shirts, you must print print. Fake facts into old forum comments, you must edit. Poison the data well, you must.

I mean we aren't even fighting AI, we are still fighting greedy little turds
Problem is, it still results in turning the Internet to shit. We just do it manually to preempt the AI doing it.

Messages that people post on Stack Exchange sites are literally licensed CC-BY-SA, the whole point of which is to enable them to be shared and used by anyone for any purpose. One of the purposes of such a license is to make sure knowledge is preserved by allowing everyone to make and share copies.

That license would require chatgpt to provide attribution every time it used training data of anyone there and also would require every output using that training data to be placed under the same license. This would actually legally prevent anything chatgpt created even in part using this training data from being closed source. Assuming they obviously aren't planning on doing that this is massively shitting on the concept of licensing.
- CC attribution doesn't require you to necessarily have the credits immediately with the content, but it would result in one of the world's longest web pages as it would need to have the name of the poster and a link to every single comment they used as training data, and stack overflow has roughly 60 million questions and answers combined.
- IF its outputs are considered derivative works.
- Maybe but I don’t think that is well tested legally yet. For instance, I’ve learned things from there, but when I share some knowledge I don’t attribute it to all the underlying sources of my knowledge. If, on the other hand, I shared a quote or copypasta from there I’d be compelled to do so I suppose.
  
  I’m just not sure how neural networks will be treated in this regard. I assume they’ll conveniently claim that they can’t tie answers directly to underpinning training data.
Share Alike

I can't wait to download my own version of the latest gpt model
- It does help to know what those funny letters mean. Now we wait for regulators to catch up..
  
  /tangent
  
  If anything, we're a very long way from anything close to intelligent, OpenAI (and subsequently MS, being publicly traded) sold investors on the pretense that LLMs are close to being "AGI" and now more and more data is necessary to achieving that.
  
  If you know the internet, you know there's a lot of garbage. I for one can't wait for garbage-in garbage-out to start taking its toll.
  
  Also I'm surprised how well open source models have shaped up, its certainly worth a look. I occasionally use a local model for "brainstorming" in the loosest terms, as I generally know what I'm expecting, but it's sometimes helpful to read tasks laid out. Also comfort in that nothing even need leave my network, and even in a pinch I got some answers when my network was offline.
  
  It gives a little hope while corps get to blatantly violate copyright while having wielding it so heavily, that advancements have been so great in open source.

You really don't need anything near as complex as AI...a simple script could be configured to automatically close the issue as solved with a link to a randomly-selected unrelated issue.

So vanilla stack overflow?
- That’s the joke

The enshittification is very real and is spreading constantly. Companies will leech more from their employees and users until things start to break down. Acceleration is the only way.

Accelerationism is like being on a plane and wishing it crashes when one of the engine fails.
- I mean, sure but in the context of individual websites I don't see it being a big deal. There will be replacements, and relatively quickly. Accelerationism applied to major societal structures is a terrible idea though.
- That's a terrible analogy, implying the wish that everyone on the plane dies if one engine fails.
  
  It's like an airline company has been complete shit for decades, wanting to see them fail fast so that a better airline company can take their place.
- Except it's not like a plane because we can stop using specific websites whenever we like, and build our own websites to whittle away at their hegemony.

primary use for AI is self destructing your website.

I dunno. AlphaFold 3 is pretty big.

I despise this use of mod power in response to a protest. It's our content to be sabotaged if we want - if Stack Overlords disagree then to hell with them.

I'll add Stack Overflow to my personal ban list, just below Reddit.

Once submitted to stack overflow/Reddit/literally every platform, it's no longer your content. It sucks, but you've implicitly agreed to it when creating your account.
- While true, it's stupid that things are that way. They shouldn't be able to hide behind the idea that "we're not responsible for what our users publish, we're more like a public forum" while also having total ownership over that content.
- you’ve implicitly agreed to it when creating your account
  
  Many people would agree with that, probably most laws do. However I doubt many users have actually bothered to read the unnecessarily long document, fewer have understood the legalese, and the terms have likely already been changed ~pray I don't alter it any further~. That's a low and shady bar of consent. It indeed sucks and I think people should leave those platforms, but I'm also open to laws that would invalidate that part of the EULA.

And the enshittification continues...

this is getting to be an interesting event / phenomenon

Eventually, we will need a fediverse version of StackOverflow, Quora, etc.

Those would be harvested to train LLMs even without asking first. 😐
- At this point I’m assuming most if not all of these content deals are essentially retroactive. They already scrapped the content and found it useful enough to try and secure future use, or at least exclude competitors.
- Honestly? I'm down with that. And when the LLM's end up pricing themselves out of usefulness, we'll still have the fediverse version. Having free sites on the net with solid crowd-sourced information is never a bad thing even if other people pick up the data and use it.
  
  It's when private sites like Duolingo and Reddit crowd source the information and then slowly crank down the free aspect that we have the problems.
  
  The Ad sponsored web model is not viable forever.
- I’d rather the harvesting be open to all than only the company hosting it.
- Assuming the federated version allowed contributor-chosen licenses (similar to GitHub), any harvesting in violation of the license would be subject to legal action.
  
  Contrast that with Stack Exchange, where I assume the terms dictated by Stack Exchange deprive contributors of recourse.
- SO already was. Not even harvested as much as handed to them. Periodic data dumps and a general forced commitment to open information were a big part of the reason they won out over other sites that used to compete with them. SO most likely wouldn't have existed if Experts Exchange didn't paywall their entire site.
  
  As with everything else, AI companies believe their training data operates under fair use, so they will discard the CC-SA-4.0 license requirements regardless of whether this deal exists. (And if a court ever finds it's not fair use, they are so many layers of fucked that this situation won't even register.)
- But users and instances would be able to state that they do not want their content commercialized. On StackOverflow you have no control over that.
Not fediverse, but open-source and community run: https://codidact.com
- Oh this looks decent. British non-profit, I like it. Registering.
- Smells too much like duo-lingo. Here, everyone jump in and answers all the questions. 5 years later, ohh look at this gold mine of community data we own....
Everything you write on here is public. There's nothing stopping anyone from using that data for training
- Yeah but didn't you see the sovereign citizens who think licenses are magic posting giant copyright notices after their posts? Lol
  
  It's so childish, ai tools will help billions of the poorest people access life saving knowledge and services, help open source devs like myself create tools that free people from the clutches of capitalism, but they like living in a world of inequity because their generational wealth earned from centuries of exploitation of the impoverished allows them a better education, better healthcare, and better living standards than the billions of impoverished people on the planet so they'll fight to maintain their privilege even if they're fighting against their own life getting better too. The most pathetic thing is they pretend to be fighting a moral crusade, as if using the answers they freely posted and never expected anything in return for is a real injustice!
  
  And yes I know people are going to pretend that they think tech bros won't allow poor people to use their tech and they base this on assuming how everything always works will suddenly just flip Into reverse at some point or something? Like how mobile phones are only for rich people and only rich people can sell via the internet and only rich people can start a YouTube channel...
We needed it a few years ago.
We already have the SO data. We could populate such a tool with it and start from there.
Can we pass on quora?
- Federated yahoo answers.
- Hey, early Yahoo answers was very useful. A de-shittified, federated, stripped down to the bare questions-answers network could be neat.

I fully understand why they are doing this, but we are just losing a mass of really useful knowledge. What a shame...

And while it hurts now, it's REALLY going to hurt when large swaths of useful answers that don't exist anywhere else are gone and there's nothing replacing them.

Noone writes hundreds of pages of documentation for their stuff anymore. Without the collected knowledge learned from experience there, what do we have?

Unless we have source code to read, very little.

I'm still feeling the pain of google search results sucking combined with most of the large coding forums being gone and reddit slowly going to garbage. Stack Overflow was the last bastion of collected knowledge of it's type... and it's not like it was 25 years ago where we still had phonebook-sized manuals for almost all major software because agile has killed the concept of exhaustive definitive documentation for a given version of something.

I used to sorta roll my eyes at people shouting about federating everything, but at this point I'm scared and agreeing with them.
Vandalism is always reverted on SO, even if done by the original author. No knowledge is lost. Suing OA for violating the CC-BY license might be possible, but I'd wager SO is not interested in suing them, and since they hold the rights, not much can be done by others.

Data should be socialized and machine learning algorithms should be nationalized for public use.

Better yet, copyright should be abolished completely.
- It should stay for creative works but that's it. It should protect people who actually write books, compose music, make art, and sing. It shouldn't be held by corporations forever by leeching off their workers.
- Wouldn't that make AI training data easier to obtain?
- Public+ no copyright
I think you just invented the public library

Why does OpenAI want 10 year old answers about using jQuery whenever anyone posts a JavaScript question, followed by aggressive policing of what is and isn't acceptable to re-ask as technology moves on?

jQuery is still an excellent Javascript library
- Nice try, ChatGPT
They probably aren't looking for the factual information, perhaps more the logical thinking abilities.

I'm going to run out of sites at this pace.

Right? It seems like the modern internet is made up of like 5 monolithic sites, and unlimited SEO spam.

I know that's not literally true, but it sure feels like it.
Fortunately the AIs are getting quite good at answering technical questions like these.

While at the same time they forbid AI generated answers on their website, oh the turntables.

Yeah it would poison the training data.
- And undermine their own ai offering
They have rescinded this policy according to the article.
"the AI isn't good enough to answer questions yet, it needs more training "

"YOU HYPOCRITE!! If the A.I is too bad to use then why are you training it!"

Clean the damn mold out of your brain.

Rather than delete, modify the question so its wrong. Then the ai will hallucinate.

I just expect to insult the user while not answering the question.
- As a large language model, I expect you to use the search function. Asshole.

Reddit did almost the same and don't forget guys to delete your Reddit account

It won't matter, they would have all of your comments archived already. Even if you overwrite them AI will be scraping the copies they keep.
- it creates a lot of poisoned data especially if you like edit half your posts with nonsense
I got banned anyway lol. Reddit made it easy.
I am alright, nothing interesting on there.
- Still some narrow scope communities holding some people back (but it changes slowly).
  
  ~~Also, variety of porn is still better there~~ (but lemmynsfw.com for the win)

A malicious response by users would be to employ an LLM instructed to write plausibly sounding but very wrong answers to historical and current questions, then an army of users upvoting the known wrong answer while downvoting accurate ones. This would poison the data I would think.

All use of generative AI (e.g., ChatGPT1 and other LLMs) is banned when posting content on Stack Overflow. This includes "asking" the question to an AI generator then copy-pasting its output as well as using an AI generator to "reword" your answers.

Ironic, isn't it?
- Interestingly I see nothing in that policy that would dis-allow machine generated downvotes on proper answers and machine generated upvotes on incorrect ones. So even if LLMs are banned from posting questions or comments, looks like Stackoverflow is perfectly fine with bots voting.
Sounds like it would require some significant resources to combat.

That said, that plan comes at a cost to presumably innocent users who will bark up the wrong trees.

Maybe we need a technical questions and answers siteon the fediverse!

Not gonna stop your knowledge being fed to an AI.
- Is there an actual way to stop it? I don't think so. At least, moving to the fediverse would stop any particular corporation from having the monopoly of it, prevent reddit-like abuse of power, would give users more power, among a few other things.
- what about instances that need you to be logged in to view posts and require authorized requests for federation?
Nothing stopping them from scraping that too

Time to download the last dump: https://archive.org/details/stackexchange

E: Seeding.

While I think the reaction of StackOverflow is not good, I don't understand the users either.

EDIT: seems like the language model won't be free, I understand then.

OpenAI is a terribly misleading name.
- OpenUpYourWalletforAI
- That is how it started. It was a non-profit with the goal to release all their patents and research for free.
  
  That lasted for a few years, and then the people running it realized they could instead all become filthy rich and nobody could do anything about it. So they did that.
  
  But don't worry, they are a capped for-profit now! They can only make 100 time the amount of money as they have investments. So they'll stop when they have reached ... checks notes.... Around $1.3 trillion.

RIP in pieces Stack Overflow

The enshitification will continue while moral tanks.
- The enshitification will continue for a while, moral thanks!

Aren’t a lot of answers outdated on stackoverflow?

You are now banned from stackoverflow
- And if you try to delete your comment, you'll be DOUBLE BANNED.
The new questions were just all duplicates.
Half the time I look on stack overflow it feels like the answer is irrelevant by todays standards
- That's what happens when new posts aren't allowed to exist if it asks a similar question to an old one.
No no, jquery is the answer to all your ui needs
This question is deleted for off topic

I will answer some questions with my old account using gpt 4 to poison the data.

If you want to poison SO a little at the same time providing valid answers that help users, use outlook.com email domain for new accounts. It seems to not have anti throwaway countermeasures while being accepted by SO. And it seems fitting to bash the corporate with the corporate.

What about the outlook thing? Don't understand.
- Best I can tell… If you want to poison your significant other, communicate with outlook as a throwaway email account.

lol wow this is going even more poorly than I thought it would, and I thought my kneejerk reaction to the initial announcement was quite pessimistic.

If we can't delete our questions and answers, can we poison the well by uploading masses of shitty questions and answers? If they like AI we could have it help us generate them.

Poison the well by using AI-generated comments and answers. There isn't currently a way to reliably determine if content is human or AI-generated, and training AI on AI is the equivalent of inbreeding.
- Sounds good then.
- Stackalabama Exchange
The poison was there all along the way. The poison is us

Inserts spider man meme
You are literally the same mentality as the coal rollers

Tech that could improve life for everyone and instead of using it to make open source software or coding solutions to problems you attack it like a crab in a bucket simply because you fear change.

I am not deleting anything. They can have all of my poorly written misleading answers.

I'll just keep asking copilot about the damn exceptions until the effin code works. Na-na-nah!

Thing just like reddit, but now in professional community

enshittification
Stackoverflow counts as professional now? Wasn't the general perception that it's an incredibly toxic space?

You can't quit, you're fired!!!!

Cool, now I can go collect unemployment. :)
- You can't, you're hired!

Instead of solely deleting content, what if authors had instead moved their content/answers to something self-owned? Can SO even claim ownership legally of the content on their site? Seems iffy in my own, ignorant take.

Everything you submit to StackOverflow is licensed under either MIT or CC depending on when you submitted it.
- Regardless of the license (apart perhaps from public domain) it is legally still your copyright, since you produced the content. Pretty sure in EU they cannot prevent you from deleting your content.
- So does that mean anyone is allowed to use said content for whatever purposes they'd like? That'd include AI stuff too I think? Interesting twist there, hadn't thought about it like this yet. Essentially posters would be agreeing to share that data/info publically. No different than someone learning how to code from looking at examples made by their professors or someone else doing the teaching/talking I suppose. Hmm.
- So they have to carefully only source the MIT data?
They can. It's in the TOS when you make your account. They own everything you post to the site.
- Well I suppose in that case, protesting via removal is fine IMO. I think the constructive, next-step would be to create a site where you, the user, own what you post. Does Reddit claim ownership over posts? I wonder what lemmy's "policies" are and if this would be a good grounds (here) to start building something better than what SO was doing.

For years, the site had a standing policy that prevented the use of generative AI in writing or rewording any questions or answers posted. Moderators were allowed and encouraged to use AI-detection software when reviewing posts. Beginning last week, however, the company began a rapid about-face in its public policy towards AI.

I listened to an episode of The Daily on AI, and the stuff they fed into to engines included the entire Internet. They literally ran out of things to feed it. That's why YouTube created their auto-generated subtitles - literally, so that they would have more material to feed into their LLMs. I fully expect reddit to be bought out/merged within the next six months or so. They are desperate for more material to feed the machine. Everything is going to end up going to an LLM somewhere.

I think auto generated subtitles were to fulfil a FCC requirement, some years ago, for content subtitling. It has however turned out super useful for LLM feeding.
- Did it really fulfil the requirement?
Like Homer Simpson eating all the food at the buffet
- Or when he went to Hell
There really isn’t much in the way of detection. It’s a big problem in schools and universities and the plagiarism detectors can’t sense AI.

If i was stack overflow I would've transferred my backups to OpenAI weeks before the announcement for this very reason.

This is also assuming the LLMs weren't already fed with scraped SO data years ago.

It's a small act of rebellion but SO already has your data and they'll do whatever they want with it, including mine.

It’s true that it’s mostly a symbolic act, but the rebellion matters, especially from old accounts. It’s also a nice way to mark the time after which I never participated in SO again. After my ban expires, I’ll deface my questions again. And again. Until they permaban me.
There’s also the possibility of adding to the wonderful irony of making the AI more useful than the original by having content that’s no longer accessible through through the original. It doesn’t get more enshittified than that, even if Prashanth Chandrasekar is too out of touch to ever regret his decision.
OpenAI clearly already scraped the pre-LLM (aka actually useful) content from SO, this entire deal is happening after the fact to avoid litigation.
I think you're 100% correct in assuming they've already fed it data scraped from SO. I've previously gotten code samples from ChatGPT that was clearly from SO down to the comments in the code. Even reverse searched some of the code and found the question it was from.

Should I be glad I only ever asked shit aroud in SO?

Weren't all the answers already trained on ChatGPT last year?

They seem to only be watching the questions right now. You’re automatically prevented from deleting an accepted answer, but if you answered your own question (maybe because SO was useless for certain niche questions a decade ago so you kept digging and found your own solution), you can unaccept your answer first and then delete it.

I got a 30 day ban for “defacing” a few of my 10+ year old questions after moderators promptly reverted the edits. But they seem to have missed where I unaccepted and deleted my answers, even as they hang out in an undeletable state (showing up red for me and hidden for others).

And comments, which are a key part to properly understanding a lot of almost-correct answers, don’t seem to be afforded revision history or to have deletes noticed by moderators.

So it seems like you can still delete a bunch of your content, just not the questions. Do with that what you will.

How do I code a Rust CMS?

Closed. This question has been answered in a previous post. It is not currently accepting answers.

great much helpful wow

Its better than the people who call you an idiot

Can we change our answers? Change your answers to garbage, don't delete them. Do it slowly.

If you have low karma, then edits are reviewed by multiple people before the edit is saved. That's primarily in place to prevent spam, who could otherwise post a valid question then edit it a few months later transforming the message into a link to some shitty website.

Even with high karma, that just means your edit is temporarily trusted. It's gets reviewed and will be reverted if it's a bad edit.

And any time an edit is reverted, that's a knock against your karma. There's a community enforced requirement for all edits to be a measurable improvement.

Even moderation decisions are reviewed by multiple people - so if someone rejects a post because it's spam, when they should have rejected it because it's off topic (or approved it) then that is also going to be caught and undone. And any harmful contribution (edit or moderation decision) will result in your action being undone and your karma going down. If your karma goes down too fast, your access to the site is revoked. If you do something really bad, then they'll ban your IP address.

Moderators can also lock a controversial post, so only people with high karma can touch it at all.

... keep in mind Stack Overflow doesn't just allow editing your own posts, you can edit any content on the website, similar to wikipedia.

It's honestly a good overall approach, but around when Jeff Attwood left in 2008 it started drifting off course towards the shit show that is stack overflow today.
- It's a shame, only corporate are going to be benefiting from hard work & labour of so many talented people.
In the article the dude was banned for 7 days for changing his answer.
- So wait a few days, then do it slowly.

I don't understand what anyone wins from this

Corporations are foundationally evil

And how do they not win more if we poison the entire Internet?

It's like being in a toxic relationship with kids involved

Set boundaries

Follow rules

Don't destroy the fucking fruit of your bodies just because you are angry at each other

Fuck those guys, like a lot, for taking your given data and selling

And fuck open ai for trying to make money from scientific discoveries meant for all of humanity

But what the fuck with ruining the entire Internet?

Who gets anything then?

If language models will ruin Internet why be afraid that normal human responses are available? Wut?

Maybe a better act of rebellion would be to scrape the data on stack, self host it, and move to an open source platform. Easy for me to say though, when I only ever coded Hello World

Maybe we should start asking questions that iterate loops billions of times. Something semi-malicious that a person would recognize but an AI wouldn't.

Nah, the training data probably doesn't quite work that way. The AI would be very unlikely to test code, just regurgitate the most likely response based on it's training sets. Instead just filling posts with random bits and pieces of unrelated code and responses might be better.

The word you are looking for is "adversarial attack"
- Or Data Poisoning.

The reddit Steve method again.

This sort of thing is so self-sabotaging. The website already has your comment, and a license to use it. By deleting your stuff from the web you only ensure that the AI is definitely going to be the better resource to go to for answers.

I'm not sure about that... in Europe don't you have the right to insist that a website no longer use your content?
- Not when you've agreed to a terms of service that hands over ownership of your content to Stack Overflow, leaving you merely licensed to use your own content.
- That's an interesting point. I winder how llms handle gdpr would it be like having a tiny piece of your brain cut out
Also backups and deleted flags. Whatever comment you submitted is likely backed up already and even if you click the delete button you're likely only just changing a flag.
- Edit and save then delete.
- I feel like a lot of people don't understand the most basic things about the site. Any user with enough internet points can see deleted posts.
That's why I'm not going to bother contributing to future content.
I need to start paywalling my comments.

I mean, they could just do what reddit does and restore from backup automatically lol

Were they trying to protect ChatGPT from all the bad and convoluted answers?

Frankly, the solution here isn’t vandalism, it’s setting up a competing side and copying the content over. The license of stackoverflow makes that explicitly legal. Anything else is just playing around and hoping that a company acts against its own interests, which has rarely ever worked before.

The license of stackoverflow makes that explicitly legal

How and why is it illegal (I will take down my post about vandlism until I discuss this.)
- I’m not saying vandalism is illegal. I’m say that it borders on immoral and that there is a better, more radical (and thus effective) alternative that one might expect to be illegal but in fact isn’t.

Good to know that stackoverflow will not be a trustable place to find solutuons anymore.

It will not make a difference. The internet is free and open by design. You can always scrape the internet any time. A partnership will do nothing but make it a little bit more convenient for them.

I don't think a lot of people are crying

Welp.

cOlLaBoRatiOn

Anyone care to explain why people would care that they posted to a public forum that they don't own, with content that is now further being shared for public benefit?

The argument that it's your content becomes false as soon as you shared it with the world.

I can only really speak to reddit, but I think this applies to all of the user generated content websites. The original premise, that everyone agreed to, was the site provides a space and some tools and users provide content to fill it. As information gets added, it becomes a valuable resource for everyone. Ads and other revenue streams become a necessary evil in all this, but overall directly support the core use case.

Now that content is being packaged into large language models to be either put behind a paywall or packed into other non-freely available services. Since they no longer seem interested in supporting the model we all agreed on, I see no reason to continue adding value and since they provided tools to remove content I may as well use them.
- But from the very beginning years ago, it was understood that when you post on these types of sites, the data is not yours, or at least you give them license to use it how they see fit. So for years people accepted that, but are now whining because they aren't getting paid for something they gave away.
It's not shared for public benefit, though. OpenAI, despite the Open in their name, charges for access to their models. You either pay with money or (meta)data, depending on the model.

Legally, sure. You signed away your rights to your answers when you joined the forum. Morally, though?

People are pissed that SO, that was actively encouraging Mods to use AI detection software to prevent any LLM usage in the posted questions and answers, are now selling the publicly accessible data, made by their users for free, to a closed-source for-profit entity that refuses to open itself up.

Basically the same story as with reddit.
- Agreed. As you said it's a similar situation as with reddit, where I decided to delete my comments.
  
  My reasoning is that those contributions were given under the premise that everybody was sharing to help each other.
  
  Now that premise has changed: the large tech companies are only taking and the platform providers are changing the rules aswell to profit from it.
  
  So as a result I packed my things and left, in case of reddit to here.
  
  That said I think both views are valid and I wouldn't fault those that think differently.
Lol it ain't for public benefit unless it's a FOSS model with which I'd have no issue
- Well no, when you post something it is public and out of your control
It is your content. But SE specifically only accepts CC licensed content, which makes you right.

Why?? Please make this make sense. Having AI to help with coding is ideal and the greatest immediate use case probably. The web is an open resource. Why die on this stupid hill instead of advocating for a privacy argument that actually matters?

Edit: Okay got it. Hinder significant human progress because a company I don't like might make some more money from something I said in public, which has been a thing literally forever. You guys really lack a lot of life skills about how the world really works huh?

Because being able to delete your data from social networks you no longer wish to participate in or that have banned you, as long as they specifically haven't paid you for the your contributions, is a privacy argument that actually matters, regardless and independent of AI.

In regards to AI, the problem is not with AI in general but with proprietary for-profit AI getting trained with open resources, even those with underlying license agreements that prevent that information being monetized.
- Now this is something I can get behind. But I was talking about the decision to retaliate in the first place.
Because none of the big companies listen to the privacy argument. Or any argument, really.

AI in itself is good, amazing, even.

I have no issue with open-source, ideally GPL- or similarly licensed AI models trained on Internet data.

But involuntarily participating in training closed-source corporate AI's...no, thanks. That shit should go to the hellhole it was born in, and we should do our best to destroy it, not advocate for it.

If you care about the future of AI, OpenAI should long be on your enemy list. They expropriated an open model, they were hypocritical enough to keep "open" in the name, and then they essentially sold themselves to Microsoft. That's not the AI future we should want.
Were in a capitalist system and these are for-profit companies, right? What do you think their goal is. It isn't to help you. It's to increase profits. That will probably lead to massive amounts of jobs replaced with AI and we will get nothing for giving them the data to train on. It's purely parasitic. You should not advocate for it.

If it's open and not-for-profit, it can maybe do good, but there's no way this will.
- Why can’t they increase profits, by you know, making the product better.
  
  Do they have to make things shitter to save money and drive away people thus having to make it more shitter.
- Meta and Google have done more for open source ai than anyone else, I think a lot of antis don't really understand how computer science works so you imagine it's like them collecting up physical iron and taking it into a secret room never to be seen again.
  
  The actual tools and math is what's important, research on best methods is complex and slow but so far all these developments are being written up in papers which anyone can use to learn from - if people on the left weren't so performative and lazy we could have our own ai too
humanity progress is spending cities worth of electricity and water to ask copilot how to use a library and have it lie back to you in natural language? please make this make sense
- ??? So why don't we get better at making energy than get scared about using a renewable resource. Fuck it let's just go back to the printing press.
  
  Amazing to me how stuff like this gets upvoted on a supposedly progressive platform.
Why do people roll coal? Why do vandalize electric car chargers? Why do people tie ropes across bike lanes?

Because a changing world is scary and people lash out at new things.

The coal rollers think they're fighting a vallient fight against evil corporations too, they invested their effort into being a car guy and it doesn't feel fair that things are changing so they want to hurt people benefitting from the new tech.
- The deeper I get into this platform the more I realize the guise of being 'progressive, left, privacy-conscious, tech inclined' is literally the opposite.
Good to know as capitalism flounders this modern Red Scare extends into tech.

You're explicitly ignoring everything everyone is saying just cause you want to call everyone technocommies lmfao.
- When you say those words do you imagine normal people reading this and not laughing
Hating on everything AI is trendy nowdays. Most of these people can't give you any coherent explanation for why. They just adopt the attitude of people around them who also don't know why.

I believe the general reasoning is something along the lines of not wanting bad corporations to profit from their content for free. So it's just a matter of principle for the most part. Perhaps we need to wait for someone to train LLM on the freely available to everyone data on Lemmy and then we can interview it to see what's up.
- Mega co operations like Microsoft, Google are evil. Very easy explanation. Even if it was a good open source company scraping the data to train ai models, people should be free to delete the datta they input. It's pretty simple to understand.

We all hate AI but please don't destroy the data on stack overflow