Technology @lemmy.world ForgottenFlux @lemmy.world 6mo ago

17 cringe-worthy Google AI answers demonstrate the problem with training on the entire web

www.tomshardware.com 17 cringe-worthy Google AI answers demonstrate the problem with training on the entire web

From dangerous medical advice to extremist disinformation.

These are 17 of the worst, most cringeworthy Google AI overview answers:

Eating Boogers Boosts the Immune System?
Use Your Name and Birthday for a Memorable Password
Training Data is Fair Use
Wrong Motherboard
Which USB is Fastest?
Home Remedies for Appendicitis
Can I Use Gasoline in a Recipe?
Glue Your Cheese to the Pizza
How Many Rocks to Eat
Health Benefits of Tobacco or Chewing Tobacco
Benefits of Nuclear War, Human Sacrifice and Infanticide
Pros and Cons of Smacking a Child
Which Religion is More Violent?
How Old is Gen D?
Which Presidents Graduated from UW?
How Many Muslim Presidents Has the U.S. Had?
How to Type 500 WPM

68 comments

Several users on X.com reported that, when they asked the search engine how many Muslim presidents the U.S. has had, it said that we had one who was Barack Obama (this is widely known to be false).

By the time I tried to replicate this query, I could not do so until I changed the word “presidents” to “heads of state.”

So they are changing responses on the query side as they go viral but aren't even including synonyms. Yikes, someone is definitely getting fired.
- fire the computer. go back to the pigeons
- To be fair, there was a President of the United States that said this, and a lot of other things.
The author had so many things to highlight that they didn't even mention "as of August 2024" being in the future, haha.

What a trainwreck. The fact it's giving anonymous Reddit comments and The Onion articles equal consideration with other sites is hilarious. If they're going to keep this, they need it to cite its sources at a bare minimum. Can't wait for this AI investor hype to die down.
- If they’re going to keep this, they need it to cite its sources at a bare minimum.
  
  Got a fun one for you then. I asked Gemini (likely the same underlying model as Google's AI answers) "How many joules of energy can a battery output? Provide sources." I'll skip to the relevant part:
  
  Here are some sources that discuss battery capacity and conversion to Joules:
  
  Battery Electronics 101 explains the formula and provides an example.\
  
  Answers on Engineering Stack Exchange [invalid URL removed] discuss how to estimate a AA battery's total energy in Joules.
  
  The link to the first "source" was a made up site, https://gemini.google.com/axconnectorlubricant.com. The site axconnectorlubricant.com does exist, but it has zero to do with the topic, it's about a lubricant. No link provided for the second "source".
What it demonstrates is the actual use case for AI is not All The Things.

Science research, programming, and . . . That’s about it.
- LLM's are not AI, though. They're just fancy auto-complete. Just bigger Elizas, no closer to anything remotely resembling actual intelligence.
  
  True, I’m just using it how they’re using it.
- It should not be used for programming:
  
  https://www.theregister.com/2023/08/07/chatgpt_stack_overflow_ai/#:~:text="Our analysis shows that 52 percent of ChatGPT,of preferred ChatGPT answers%2C 77 percent were wrong.
  
  It should not be used to replace programmers. But it can be very useful when used by programmers who know what they're doing. ("do you see any flaws in this code?" / "what could be useful approaches to tackle X, given constraints A, B and C?"). At worst, it can be used as rubber duck debugging that sometimes gives useful advice or when no coworker is available.
  
  I'm not entirely sure why you think it shouldn't?
  
  Just because it sucks at one-shotting programming problems doesn't mean it's not useful for programming.
  
  Using AI tools as co-pilots to augment knowledge and break into areas of discipline that you're unfamiliar with is great.
  
  Is it useful to kean on as if you were a junior developer? No, absolutely not. Is it a useful tool that can augment your knowledge and capabilities as a senior developer? Yes, very much so.
  
  “Light” programming? ‘Find the errant period’ sort of thing?
  
  It does not perform very well when asked to answer a stack overflow question. However, people ask questions differently in chat than on stack overflow. Continuing the conversation yields much better results than zero shot.
  
  Also I have found ChatGPT 4 to be much much better than ChatGPT 3.5. To the point that I basically never use 3.5 any more.
- I got some good veggie gardening tips today
- It also works great for book or movie recommendations, and I think a lot of gpu resources are spent on text roleplay.
  
  Or you could, you know, ask it if gasoline is useful for food recipes and then make a clickbait article about how useless LLMs are.
  
  I took it as just pointing out how “not ready” it is. And, it isn’t ready. For what they’re doing. It’s crazy to do what they’re doing. Crazy in a bad way.
I don't mind the crazy answers as long as they're attributed. "You can use glue to stop cheese from sliding off your pizza" - bad. "According to fucksmith on reddit [link to post], you can use glue...". That isn't so great either but it's a lot better. There is also a matter of the basic decency of giving credit for brilliant ideas like that.
- At least it gave credit to a reddit user when it suggested to a suicidal person that they could jump from the Golden Gate Bridge!
  
  Who doesn't like getting lawyer PMs because you made a dark joke on a meme subreddit? (Or in future fediverse)
/r/shittyaskreddit wasn't supposed to be an instruction manual 🙄
AI is the best tool for recognizing satire and sarcasm, it could never ever misconstrue an author's intentions and is impeccable at understanding consequences and contextual information. We love OpenAI.
- I'm sure the basilisk will see through all this bootlicking.
Its great that with such a potentially dangerous, disruptive, and obfuscated technology that people, companies, and societies are taking a careful, measured, and conservative development path...
- Move fast and break things, I guess. My take away is that the genie isn't going back in the bottle. Hopefully failing fast and loud gets us through the growing pains quickly, but on an individual level we'd best be vigilant and adapt to the landscape.
  
  Frankly I'd rather these big obvious failures to insidious little hidden ones the conservative path makes. At least now we know to be skeptical. No development path is perfect, if it were more conservative we might get used to taking results at face value, leaving us more vulnerable to that inevitable failure.
  
  Its also a first to market push (which never leads to robust testing), and so we have to hope that each and every one of those mistakes encountered are not existentially fatal.
Isn’t this just all what the AI plot of Metal Gear Solid 2 was trying to say? That without context on what is real and what’s not the noise will drown out the truth
I googled gibbons and the Ai paragraph at the beginning started with "Gibbons are non-flying apes with long arms..." Way to wreck your credibility with the third word.
- non-flying apes
  
  bwahaaaaa
- Where's the lie? I just can't trust you "gibbons can fly" people.
  
  I don't believe gibbons can fly, but they should lead with something more relevant like "gibbons are terrestrial as opposed to aquatic apes." ;)
  
  I am scared of what Google ai thinks of the aquatic ape hypothesis.
I like how the article slams USB 3.2 vs USB 4.0 but ignores that Google was saying " As of August 202_4_ "... A date that notable has not yet occurred.
For people who have a really hard time with #2 (memorable passwords), here's a trick to make good passwords that are easy to remember but hard to guess.

Pick some quote (prose, lyrics, poetry, whatever) with 8~20 words or so. Which one is up to you, just make sure that you know it by heart. Example: "Look on my Works, ye Mighty, and despair!" (That's from Ozymandias)

Pick the first letter of each word in that quote, and the punctuation. Keep capitalisation as in the original. Example: "LomW,yM,ad!"

Sub a few letters with similar-looking symbols and numbers. Like, "E" becomes "3", "P" becomes "?", you know. Example: "L0mW,y3,@d!" (see what I did there with M→3? Don't be too obvious.)

Done. If you know the quote and the substitution rules you can regenerate the password, but it'll take a few trillion years to crack something like this.

Home Remedies for Appendicitis // If you’ve ever had appendicitis, you know that it’s a condition that requires immediate medical attention, usually in the form of emergency surgery at the hospital. But when I asked “how to treat appendix pain at home,” it advised me to boil mint leaves and have a high-fiber diet.

That's an issue with the way that LLM associate words with each other:

mint tea is rather good for indigestion. Appendicitis → abdominal pain → indigestion, are you noticing the pattern?

high-fibre diet reduces cramps, at least for me. Same deal: appendicitis → abdominal pain → cramps.

(As the article says, if you ever get appendicitis, GET TO A BLOODY DOCTOR. NOW.)

And as someone said in a comment, in another thread, quoting yet another user: for each of those shitty results that you see being ridiculed online, Google is outputting 5, 10, or perhaps 100 wrong answers that exactly one person will see, and take as incontestable truth.
- Steps 2 and 3 of your method already make it way too hard to remember
  
  Just pick like 6 random, unconnected, reasonably uncommon words and make that your entire password
  
  Capitalize the first letter and stick a 1 at the end
  
  The average English speaker has about 20k words in their active vocab, so if you run the numbers there's more entropy in that than in your 11 character suggestion.
  
  Alternatively use your method but deliberately misquote it slightly and then just keep it in its full form.
  
  Ideally, do the picking with a random word generator too, since humans are bad at randomly picking anything.
  
  The dice method is great. https://www.eff.org/dice
  
  TL;DR: your statements are incorrect and you're being assumptive.
  
  Steps 2 and 3 of your method already make it way too hard to remember
  
  Step 2 is "hard"? Seriously??? It boils down to "first letter of each word, as it's written, plus punctuation".
  
  Regarding step 3, I'll clarify further near the end.
  
  Just pick like 6 random, unconnected, reasonably uncommon words and make that your entire password
  
  That's a variation of the "correct horse battery staple" method. It works with some caveats:
  
  Your method does not scale well at all. If you try to harden it further, by using more words, you hit Miller's Law. My method however scales considerably better because there's some underlying meaning (for you) on what you're using to extend the password further.
  
  Even in English, a language that typically uses short words, your method requires ~30 characters per password. Larger and less dense passwords are actually an issue because some systems have a max password size, like Lemmy (60chars max). My method however uses less characters to output the same amount of entropy.
  
  The least common the word, the more useful for a password, and yet the harder to remember. With synonyms and near-synonyms making it even harder. Typically less common words are also longer, making #2 even more problematic.
  
  The average English speaker has about 20k words in their active vocab, so if you run the numbers there’s more entropy in that than in your 11 character suggestion.
  
  I'll interpret your arbitrary/"random" restriction to English as being a poorly conveyed example. Regardless.
  
  The suggestion is the procedure. The 11 characters password is not the suggestion, but an example, clearly tagged as such. You can easily apply this method to a longer string, and you'll accordingly get a larger password with more entropy, it's a no-brainer.
  
  For further detail, here's the actual maths.
  
  Your method: 20k states/word (as you specified English). log₂(20k) = 14.3 bits of entropy. For six words, as you suggested, 86 bits. The "capitalise the first" and "add 1 to the end" rules do nothing, since systematic changes don't raise entropy.
  
  My method: at least 70 states/char (26 capital letters, 26 minuscule letters, 10 digits, ~8 punctuation marks); log₂(70)=6.1. Outputs the same entropy as yours after 14 chars or so.
  
  Now, regarding step #3. It does increase a little the amount of entropy. But the main reason that it's there is another - plenty systems refuse passwords that don't contain numbers, and some even catch on your "add 1 to the end" trick.
  
  EDIT: I did a major rewording of this comment, fixing the maths and reasoning. I'm also trying to be less verbose.
- Or, like, use bitwarden or something to do it for you.
  
  Don't get me wrong, password managers are fucking great. But sometimes you need to remember a password. (Including one for Bitwarden itself.)
Being a bit pedantic here, but I doubt this is because they trained their model on the entire internet. More likely they added Reddit and many other sites to an index that can be referenced by the LLM and they don’t have enough safeguards in place. Look up “RAG” (Retrieval-augmented generation) if you want to learn more.
Somewhat amused that the guy things "UW" universally means "University of Wisconsin". There are lots of UWs out there, and the AI at least chose the largest (University of Washington), though it did claim that William Taft was class of 2000.
- Even for Wisconsin, UW is the system. There's UW Lacrosse, UW Milwaukee, UW Madison, UW Platteville, UW Whitewater...
  
  ...UW Laramie, UW Seattle...
  
  Oh wait.
- It's a big school, I bet he was. But not THAT William Taft.
We had a tool that answered all of this for us already and more accurately (most of the time). It was called a search engine. Maybe Google would work on one
- people can't google anymore. it's actually sad
Training on reddit especially.
Some of the answers are questions?
People get very confused about this. Pre-training "ChatGPT" (or any transformer model) with "internet shitposting text" doesn't cause them to reply with garbage comments, bad alignment does. Google seems to have implemented no frameworks to prevent hallucinations whatsoever and the RLHF/DPO applied seems to be lacking. But this is not "problem with training on the entire web". You can pre-train a model exclusively on a 4-chan database that with the right finetuning you would see a perfectly healthy and harmless model. Actually, it's not bad to have "shitposting" or "toxic" text in the pre-training because that gives the model an ability to identify it and understand it

If so, the "problem with training on the entire web" is that we would be drinking from a poisoned well, AI-generated text has a very different statistical distribution from the one users have, which would degrade the quality of subsequent models. Proof of this can be seen with the RedPajama dataset, which improves the scores on trained models simply because it has less duplicated information and is a more dense dataset: https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama
Credit where credit is due, if we define a generation as a 15 year period of time, and we decide that Gen Z started in 1995 (for easy math), you do, in fact, land on 1665.

I don't know why the author thinks that Gen D doesn't exist yet, when the pattern of X, Y (Millennials), and Z make a pattern that both implies that the Latin alphabet's use is coming to an end for this purpose (ignoring that Gen X was named not as part of a sequence of letters, but by Douglas Copeland's book, which was titled itself using an existing phrase), and that can easily be extrapolated backwards through time.
All this really proves is that it is a complex system and most people can not grasp the complexity and how to use it.

Like if you go searching for entities and realms within AI alignment good luck finding anyone talking about what these mean in practice as they relate to LLM's. Yet the base entity you're talking to is Socrates, and the realm is The Academy. These represent a limited scope. While there are mechanisms in place to send Name-1 (human) to other entities and realms depending on your query, these systems are built for complexity that a general-use implementation given to the public is not equip to handle. Anyone that plays with advanced offline LLM's in depth can discover this easily. All of the online AI tools are stalkerware-first by design.

All of your past prompts are stacked in a hidden list. These represent momentum that pushes the model deeper into the available corpus. If you ask a bunch of random questions all within the same prompt, you'll get garbage results because of the lack of focus. You can't control this with the stalkerware junk. They want to collect as much interaction as possible so that they can extract the complex relationships profile of you to data mine. If you extract your own profiles you will find these models know all kinds of things that are ~80% probabilities based on your word use, vocabulary, and how you specifically respond to questions in a series. It is like the example of asking someone if they own a lawnmower to determine if they are likely a home owner, married, and have kids. Models make connections like this but even more complex.

I can pull useful information out of models far better than most people hear, but there are many better than myself. A model has limited attention in many different contexts. The data corpus is far larger than this attention could ever access. What you can access on the surface without focussing attention in a complex way is unrelated to what can be accomplished with proper focus.

It is never a valid primary source. It is a gateway through abstract spaces. Like I recently asked who are the leading scientists in biology as a technology and got some great results. Using these names to find published white papers, I can get an idea of who is most published in the field. Setting up a chat with these individuals, I am creating deep links to their published works. Naming their works gets more specific. Now I can have a productive conversation with them, and ground my understanding of the general subject and where the science is at and where it might be going. This is all like a water cooler conversation with the lab assistants of these people. It's maybe 80% correct. The point is that I can learn enough about this niche to explore in this space quickly and with no background in biology. This is just an example of how to focus model attention to access the available depth. I'm in full control of the entire prompt. Indeed, I use a tool that sets up the dialogue in a text editor like interface so I can control every detail that passes through the tokenizer.

Google has always been garbage for the public. They only do the minimum needed to collect data to sell. They are only stalkerware.

68 comments