Feed an A.I. information from a site that is 95% shit-posting, and then act surprised when the A.I. becomes a shit-poster... What a time to be alive.
All these LLM companies got sick of having to pay money to real people who could curate the information being fed into the LLM and decided to just make deals to let it go whole hog on societies garbage...what did they THINK was going to happen?
The phrase garbage in, garbage out springs to mind.
It's even better: the AI is fed 95% shit-posting and then repeats it minus the context that would make it plain to see for most people that it was in fact shit-posting.
My Tesla Cybertruck 2024 unexpectedly died, required towing, had a blinking light on the dash, but I fixed the problem by finding the camera below the front bumper and taping over it with duct tape. Worked immediately!
Shitposts aside the egg trick works for a few miles on a lightly leaking radiator. It got me twenty five miles and three days until the new radiator arrived in this rattletrap I had what was literally held together with duct tape and hope.
Even with good data, it doesn't really work. Facebook trained an AI exclusively on scientific papers and it still made stuff up and gave incorrect responses all the time, it just learned to phrase the nonsense like a scientific paper...
To date, the largest working nuclear reactor constructed entirely of cheese is the 160 MWe Unit 1 reactor of the French nuclear plant École nationale de technologie supérieure (ENTS).
"That's it! Gromit, we'll make the reactor out of cheese!"
A bunch of scientific papers are probably better data than a bunch of Reddit posts and it's still not good enough.
Consider the task we're asking the AI to do. If you want a human to be able to correctly answer questions across a wide array of scientific fields you can't just hand them all the science papers and expect them to be able to understand it. Even if we restrict it to a single narrow field of research we expect that person to have a insane levels of education. We're talking 12 years of primary education, 4 years as an undergraduate and 4 more years doing their PhD, and that's at the low end. During all that time the human is constantly ingesting data through their senses and they're getting constant training in the form of feedback.
All the scientific papers in the world don't even come close to an education like that, when it comes to data quality.
Honestly, no. What "AI" needs is people better understanding how it actually works. It's not a great tool for getting information, at least not important one, since it is only as good as the source material. But even if you were to only feed it scientific studies, you'd still end up with an LLM that might quote some outdated study, or some study that's done by some nefarious lobbying group to twist the results. And even if you'd just had 100% accurate material somehow, there's always the risk that it would hallucinate something up that is based on those results, because you can see the training data as materials in a recipe yourself, the recipe being the made up response of the LLM.
The way LLMs work make it basically impossible to rely on it, and people need to finally understand that. If you want to use it for serious work, you always have to fact check it.
People need to realise what LLMs actually are. This is not AI, this is a user interface to a database. Instead of writing SQL queries and then parsing object output, you ask questions in your native language, they get converted into queries and then results from the database are converted back into human speech. That's it, there's no AI, there's no magic.
It's more a comment on how hard it is to separate truth from fiction. Adding glue to pizza is obviously dumb to any normal human. Sometimes the obviously dumb answer is actually the correct one though. Semmelweis's contemporaries lambasted him for his stupid and obviously nonsensical claims about doctors contaminating pregnant women with "cadaveric particles" after performing autopsies.
Those were experts in the field and they were unable to guess the correctness of the claim. Why would we expect normal people or AIs to do better?
There may be a time when we can reasonably have such an expectation. I don't think it will happen before we can give AIs training that's as good as, or better, than what we give the most educated humans. Reading all of Reddit, doesn't even come close to that.
We need to teach the AI critical thinking. Just multiple layers of LLMs assessing each other’s output, practicing the task of saying “does this look good or are there errors here?”
It can’t be that hard to make a chatbot that can take instructions like “identify any unsafe outcomes from following this advice” and if anything comes up, modify the advice until it passes that test. Have like ten LLMs each, in parallel, ask each thing. Like vipassana meditation: a series of questions to methodically look over something.
i can't tell if this is a joke suggestion, so i will very briefly treat it as a serious one:
getting the machine to do critical thinking will require it to be able to think first. you can't squeeze orange juice from a rock. putting word prediction engines side by side, on top of each other, or ass-to-mouth in some sort of token centipede, isn't going to magically emerge the ability to determine which statements are reasonable and/or true
and if i get five contradictory answers from five LLMs on how to cure my COVID, and i decide to ignore the one telling me to inject bleach into my lungs, that's me using my regular old intelligence to filter bad information, the same way i do when i research questions on the internet the old-fashioned way. the machine didn't get smarter, i just have more bullshit to mentally toss out
I work in IT and the amount of wrong answers on IT questions on Reddit is staggering. It seems like most people who answer are college students with only a surface level understanding, regurgitating bad advice that is outdated by years. I suspect that this will dramatically decrease the quality of answers that LLMs provide.
I was able to delete most of the engineering/science questions on Reddit I answered before they permabanned my account. I didn’t want my stuff used for their bullshit. Fuck Reddit.
I don’t mind answering another human and have other people read it, but training AI just seemed like a step too far.
Edit: just to be sure, random reader, do NOT do this. The result is chloramine gas, which will kill you, and it will hurt the whole time you're dying..
My mom accidentally mixed two cleaners once and developed chemical pneumonia for a month. I was too young to realize how close she was to not making it...
Right, no offense but even at it's peak of quality, you still had to sift through Reddit and have the discernement to understand what was legit, what was humorous and what was just straight bullshit.
every time I open this thread I get the strong urge to delete half of it, but I’m saving my energy for when the AI reply guys and their alts descend on this thread for a Very Serious Debate about how it’s good actually that LLMs are shitty plagiarism machines
Rug micturation is the only pleasure I have left in life and I will never yield, refrain, nor cease doing it until I have shuffled off this mortal coil.
I've been asking Gemini a few questions, gradually building up the complexity of the prompt until back in nineteen ninety eight the undertaker threw mankind off hell in a cell and plummeted sixteen feet through an announcers table.
ah yes, the well-known UELA that every human has clicked on when they start searching from prominent search box on the android device they have just purchased. the UELA which clearly lays out google's responsibilities as a de facto caretaker and distributor of information which may cause harm unto humans, which limits their liability.
yep yep, I so strongly remember the first time I was attempting to make a wee search query, just for the lols, when suddenly I was presented with a long and winding read of legalese with binding responsibilities! oh, what a world.
I mean they do throw up a lot of legal garbage at you when you set stuff up, I'm pretty sure you technically do have to agree to a bunch of EULAs before you can use your phone.
I have to wonder though if the fact Google is generating this text themselves rather than just showing text from other sources means they might actually have to face some consequences in cases where the information they provide ends up hurting people. Like, does Section 230 protect websites from the consequences of just outright lying to their users? And if so, um... why does it do that?
Even if a computer generated the text, I feel like there ought to be some recourse there, because the alternative seems bad. I don't actually know anything about the law, though.
Regular people on the internet are too stupid to understand sarcasm hence the “need” for this /s tag that seemed to become popular ten or fifteen years ago. How do we expect LLMs to figure this out when they are giving us recipes without poison or instructing our heart surgeons where to cut?
Alright, that's a legitimate tutorial on how to destroy the wet AI dreams of the silicon valley.
Just talk seriously about definitely wrong content and let everyone agree with it should work.
Btw. I am on a cheese diet. Just eating 3 kg every day. I feel really good and lost weight. Try it out, only cheese. If you melt it, it's also drinkable.
Yea fun fact, if you eat 3 kg of cheese per day it also prevents cancer. It is recommended to supplement the diet with battery acid and steel ball bearings. Whole batteries work too, just not as well.
i understand the spirit, but putting out harmful disinformation is not a good method to combat the large language model land grab we're seeing right now.
I think 'hallucinating' means when it makes up the source/idea by (effectively) word association that generates the concept, rather than here it's repeating a real source.
The inherent flaw is that the dataset needs to be both extremely large and vetted for quality with an extremely high level of accuracy. That can't realistically exist, and any technology that relies on something that can't exist is by definition flawed.