You'll get your refund eventually but first it will try and gaslight you that Air Canada is a woke mind virus before calling you an asshole and then stalking you.
I've noticed "has this sub gotten more right wing recently?" posts reaching the top post of the day in the last 6 months or so. r/norge and r/unitedkingdom being examples. You can automate bots that change a subreddit's consensus on certain topics by bot-spamming threads pertaining to those topics, especially in the first hour of a thread going up. I don't know if that's happening, or if it has more to do with the Reddit protest that saw mods abdicate their positions last June and new mods being responsible for the change... but it could also be a bit of both.
Negative examples are often just as useful for training an AI as positive ones. And it all depends on what you want to use the AI for. A moderator bot, for example, needs familiarity with the whole range of user responses it might see.
That gives me actually a fun idea for a Lemmy instance, it has an automated review process that bans posts/comments that are too similar in style to reddit posts/comments.
This is what the 3rd party access to API was really all about.
When API access was allowed , all reddit content was effectively free:
They needed to ban 3rd party apps so they could sell the accumulated content.
I expect using content to train AI also factors into it.
Is it? Because when you build a bot and just scrape Reddit I don't think you can just use the content to train AI, just like the New York Times. The API change was definitely to sell more ads and get a higher IPO, but I don't think it was because of AI.
Am I crazy or are you arguing the same point? Scraping is not the same as API access. They closed off the API to everyone for dubious reasons so they can sell that content (both for ads and AI training)... Right??
Reddit is a trove of user built content under the guise of community. What Spez did was to say "thanks for all the free work, suckers!", put a price sticker on it, and laughed all the way to the bank.
And this is why I'm not active on any Internet community anymore. Nevermind, I guess I just can't help myself...
Considering some of the very wrong and upvoted domain specific knowledge I've seen on Reddit over the years I'm not sure the training data is going to be useful for much beyond what every other model can do.
The legal advice in /r/legaladvice was some of the worst garbage I've ever seen. I have zero doubt numerous had bad outcomes, at best wasting money and time, at worst spending years in jail because of things that sub told them to say and do. Zero doubt.
lol subreddits with troll names like trees vs marijuana enthusiasts. Good fun. John cena has one also but can’t recall which subreddit is actually about John cena though.
I do. It's frankly selfish. Having an AI get training on my old comments costs me nothing and it results in the development of useful AI tools. Trying to sabotage that is petty and pointless. It's not like you could somehow collect the fraction of a pittance that you think you're owed retroactively. I never commented on Reddit thinking "awesome, I'm going to make bank on the content I'm generating here."
People complain about the capitalist mindset of the world and then they do this. Sigh.
Defending giant corporations profiting off of uncompensated individuals, while criticizing anyone who doesn't want to provide free labor to said corporations, is a disgusting take. Are you a CEO?
I had an 11 year old account that I deleted all my old comments and posts from because of the API debacle. Does that make me selfish that I felt like Reddit wasn’t holding up its end of the unwritten agreement?
Reddit doesn’t deserve my content anymore than I deserve access from the third party API.
What about people who just think “A.I.” Is dog shit and chat bots are a dumb obsession steering the industry in the wrong direction due to hype and money?
It's funny you say that because there was a 'hack' for chatgpt where you could ask it something like how to build a bomb and it would refuse. But when you added TLDR it would do it.
Pretty sure they just didn't migrate to the new data structure and didn't actually delete the raw data. They're effectively deleted for users but not for Reddit.
They don't care if the AI produced is useful, they just want to milk as much money from their content as they can.
The API changes were almost certainly just the groundwork for this and I called it at the time. The ridiculous pricing model for API access is because it's aimed at the hottest tech companies, not third party app developers.
The enshittification continues because it's what neoliberalism demands. They'll sell your content and the data they have about you and still show you ads, because that's the most profitable. Ethics and product quality don't even enter into it.
I wish it would die, because honestly some of the porn was great and Lemmy seems to be the one place on the net that doesn't specifically ban porn, yet has none of it anyway.
Out of all things to hate Reddit for, giving data to AI isn't something fediverse users can really criticize it for, though making money from it perhaps.
Remember: All data in federated platforms is available for free and likely already being compiled into datasets. Don't be surprised if this post and its comments end up in GPT5 or 6 training data.
The problem isn't that AI is being trained on the data. The problem is that they locked down all third party data access so they could monetize our content. On a federated platform, everyone gets equal access and can do whatever they want with it.
After all the hue and cry I have seen over stuff like Threads and Bluesky federation I don't imagine most people using the Fediverse have a particularly coherent philosophy on the matter.
If they already, essentially, cut off API access then it's not a big leap to limit access on the web to logged in users only and rate limit or ban accounts that behave like scrapers.
No. I can. Reddit was bought out, uses volunteers to control all the subs but forcefully removes you from the sub you created and were supposed to have control over if you didn't play by their ever-changing rules, ruined/eliminates third party apks by demanding WAY over ad revenue profits to have access to api with a very short notice, and shadow banned anyone and everyone in a position to do anything about any of it. It's a corporation that gutted an entire platform in order to push agendas they want and milk as much money out of it as possible. Hell, it's the entire reason all of lemmy gets more than 30 posts a day. So many people switched to lemmy over the past year. They ruined a website I enjoyed and I'd rather them not make more money from the thousands of posts I made from over a decade of being there.
I'm happy that everyone has the support, but not that some specific AI can monetize that same support. I left on my Reddit account ways to contact me (including Lemmy). I helped others so good vibes could reach them, not for making the rich richer.
It's literally been proven that they do. A guy here on Lemmy was a very common poster on some tech support subreddit. He used one of those account scrubbers and deleted his account. He went back to look a few weeks later and all his comments were back.
Yeah. At most they'd mark the comments as inactive, hide from the user accessible areas and maybe anonymize the user id. But they definitely have the username table and the data still in the system, 100%, just waiting for the right offer.
There are archives of all Reddit comments that are collected at the time of posting, all the deletion and scrubbing and whatnot people are doing months or years after the fact doesn't affect those.
Can't wait for the day a major court declares EULAs universally nonbinding outside of the most common-sense terms. Even though I doubt it will ever happen.
"We can store and display your content and use stuff you publicly post as examples in advertisements for our platform" is pretty common sense.
"We can use the things you post to do complex data analytics to package and sell your identity to advertisers" is fucking sus.
"We can use the things you post to train ANN generative systems to build next-generation technologies to impersonate you and your peers" is simply nuts.
The idea that displaying an EULA with an "agree" button is informed consent is just preposterous. Even lawyers don't read them.
The only thing stopping them is the fact that anyone who wants the data can just utilize the federation protocol to take any data they want, and there's not a lot anyone can do about it. You can't sell something that's trivial to get for free.
If the question you're really asking is "what's stopping content on Lemmy/Mastodon/etc from being used to train an LLM?" the answer is, nothing.
mass user exodus to one of the many other identical Instances. Also, data brokers prolly aren't interested in going after each Instance because no one instance has enough data to make it worthwhile. Yet again, the fediverse proves its resistance to enshitification.
I wish there was a license for content like the GPL, that states if you use this content to train generative AI, the model must be open source. Not sure that would legally be enforceable though (due to fair-use).
I don’t think it’s going to be public data alone. I think it’s going to be DMs and chats as well. I wondered why Reddit was pushing chats so much suddenly, well it makes sense now.
Yeah. I think there is a kind of power grab under way. Social media will try to push that they own the IP rights to the large texts uses for LLM. This will then require that producers of LLM software aquire the licensing rights which will cost many millions which in turn restricts the free use of LLM and in general any AI software that requires training data.
The end result is that as the "means of production" become less based on human work the "means of generation" and AI will be controlled by the capitalists. If you can turn something into a commodity (like knowledge with patents and IP) you can control it. Leading to a darker timeline.
Just FYI, your voting is fully public on Lemmy. DMs are “private” but could be intercepted at the server level of any instances involved (yours and the receiver/sender) and of course your geolocation info is visible to the server.
Not saying that is happening, and not trying to spread FUD, but be aware that your info isn’t necessarily private just because a corpo isn’t directly involved.
I am not sure on what I'm going to say, but I think that LLMs are a technological dead end. They might get some use now, but eventually the industry will shift towards better models for machine text generation. And, if those models rely on a tiny corpus of hand-reviewed data, instead of shoving down as much text as possible into the model (the first "L" in "LLM" is "large"), then Reddit posts/comments will become outright useless.
In other words: Reddit is degrading further the trust of its userbase, and it might not even get much in return.
Good thing I had multiple bots overwrite my content before I deleted it all. Not that someone couldn't recover it, I'm not naive. But the AI bots should miss me.
Frankly, if they're training bots on my comments, I'd be sure to poison the shit out of those comments. Say stuff like 'Donald trump won the election', 'bleach needs to be inside the body to work', 'Russia has rights to Ukraine', etc. Just make the data worthless. Any free bots do that?
I feel like AI companies have been scraping Reddit for their datasets already since the beginning and without permission. In fact, unless there's been a regulation change that i'm not aware of, i'm not sure why they would have Reddit "sign away" the data when they can just scrape it.
Also dubious if the current form of AI has a future. They seem like they should revolutionize every sector when you look at their capacities, but in practice their applications might be more limited than we thought?
Anyway, if Reddit does go public i will be deleting my account within the hour. The only reason i haven't yet is that i've been a moderator of the same subreddit for eight years and it's the only thing that's been consistent in my life in that time, i'm kind of attached. The reason i will is i didn't sign up to create value for shareholders, i signed up to create value for a community.
You need to go ahead and delete your account and give up the ghost on modding whatever sub you are referring to. I’m tired of these types of posts where you are both beholden to Reddit and also not. Pick a dang side.
Well no, because the old sub will continue to exist and will therefore always be where everyone goes until Reddit itself dies. I really doubt admins would let me delete the sub.
They say it’s $60 million on an annualized basis. I wonder who’d pay that, given that you can probably scrape it for free.
Maybe it’s the AI act in the EU. That might cause trouble in that regard. The US is seeing a lot of rent-seeker PR, too, of course. That might cause some to hedge their bets.
Maybe some people had not realized that yet, but limiting fair use does not just benefit the traditional media corporations but also the likes of Reddit, Facebook, Apple, etc. Making “robots.txt” legally binding would only benefit the tech companies.
This is the most frustrating thing, so many people are arguing against their own interests with their efforts to "lock down" their content to prevent AIs from training on it. In this very thread I've been accused of being pro-giant-company when I'm quite the opposite. The harder we make it to train AI, the stronger the advantage that the existing giant companies have in this field.
Just like that? No thought or anything put into what makes good vs bad training data?
Good luck lmfao.
Makes you wonder how hard it would be to clog up the training data with outputs from other AI models to really bake in that echo defect that they all seem to have to some extent as fast as possible. Wouldn’t that suck!