TechTakes @awful.systems David Gerard @awful.systems 6d ago

Anthropic and Apollo astounded to find that a chatbot will lie to you if you tell it to lie to you

pivot-to-ai.com Anthropic and Apollo astounded to find that a chatbot will lie to you if you tell it to lie to you

Did you know that your chatbot might be out to deceive you? That it might be lying to you? And it might turn you into paperclips? Huge if true! Ordinary people have been using chatbots for a couple…

34 comments

Slate Scott just wrote about a billion words of extra rigorous prompt-anthropomorphizing fanfiction on the subject of the paper, he called the article When Claude Fights Back.

Can't help but wonder if he's just a critihype enabling useful idiot who refuses to know better or if he's being purposefully dishonest to proselytize people into his brand of AI doomerism and EA, or if the difference is meaningful.

edit: The claude syllogistic scratchpad also makes an appearance, it's that thing where we pretend that they have a module that gives you access to the LLM's inner monologue complete with privacy settings, instead of just recording the result of someone prompting a variation of "So what were you thinking when you wrote so and so, remember no one can read what you reply here". Que a bunch of people in the comments moving straight into wondering if Claude has qualia.
- I used to think that comparing LLMs to people was dumb, because LLMs are just feed-forward networks--basically seven bipartite graphs in a trench coat--that are incapable of introspection.
  
  However, I'm coming around to the notion that some of our drive-by visitors have a brain that's seven cells deep.
  
  Yeah, general artificial intelligence LLMs are definitely not. Human level intelligence, though... yeah, that depends on what particular human you're talking about.
  
  (Though, to be fair, this isn't limited to LLMs... it also applies to Eliza, for instance, or your average lump of granite.)
  
  I feel attacked.
  Seriously I hate the idea that my comments are replies to engagement bots. I'm sure some are but my seven cells are too busy to work out which ones.
  
  edit: cell
  
  We just are very good at anthropomorphizing. We created pet rocks for example (also showing that capitalism is more than happy to jump into this)
- I feel like "qualia" is both an interesting concept, and a buzzword that has rapidly grown to indicate people who need to be aggressively ignored.
I dont even get where they are going with this, It is a bit like asking a suspected troll if they are a troll, if they answer yes they are a troll, if they answer no you still suspect they are a troll.

(This is assuming they are not doing critihype, lets ask them. Oh no).
AI developers need to generate criti-hype — “criticism” that says the AI is way too cool and powerful and will take over the world, so you should give them more funding to control it.

This isn’t quite accurate. The criticism is that if new AI abilities run ahead of the ability to make the AI behave sensibly, we will reach an inflection point where the AI will be in charge of the humans, not vice versa, before we make sure that it won’t do horrifying things.

AI chat bots that do bizarre and pointless things, but are clearly capable of some kind of sophistication, are exactly the warning sign that as it gains new capabilities this is a danger we need to be aware of. Of course, that’s a separate question from the question of whether funding any particular organization will lead to any increase in safety, or whether asking a chatbot about some imaginary scenario has anything to do with any of this.
- With your choice of words you are anthropomorphizing LLMs. No valid reasoning can occur when starting from a false point of origin.
  
  Or to put it differently: to me this is similarly ridiculous as if you were arguing that bubble sort may somehow "gain new abilites" and do "horrifying things".
  
  I had assumed the golden age of people coming here to critihype LLMs was over because most people outside of Silicon Valley (including a lot of nontechnical people) have realized the technology’s garbage but nope! we’ve got a rush of posters trying the same shit that didn’t work a year ago, as if we’ve never seen critihype before. maybe bitcoin hitting $100,000 makes them think their new grift is gonna make it? maybe their favorite fuckheads entering office is making all their e/acc dreams come true? who can say.
- What new AI abilities, LLMs aren't pokemon.
  
  The AGI learned DECIEVE, but all i wanted it to learn is HUG.
  
  Ah yes, if there’s one lesson to be gained from the last few years, it is that AI technology never changes, and people never connect it to anything in the real world. If only I’d used a Pokémon metaphor, I would have realized that earlier.
- what if the AI sprouts wings and flies into the sky where we can't reach it?
  
  maybe that’s how the moon got mad - annoying goddamn chatbots flying in its view the whole time
- AI chat bots that do bizarre and pointless things, but are clearly capable of some kind of sophistication, are exactly the warning sign that as it gains new capabilities this is a danger we need to be aware of.
  
  hahahaha nope
  
  Here’s a video of an expert in the field saying it more coherently and at more length than I did:
  
  https://youtu.be/zkbPdEHEyEI
  
  You’re free to decide that you are right and we are wrong, but I feel like that’s more likely to be from the Dunning-Kruger effect than from your having achieved a deeper understanding of the issues than he has.

34 comments