Tbf I'm sure this is an unpaid version of some online LLM, you can only expect so much lol.
When I use GPT3.5 for things like finding specific quotes from famous books, it's excellent... but asking it to play chess gives you blatantly illegal moves. Then GPT4 kicks my ass in chess.
Growing up in an environment where mistakes were unacceptable sets the stage. Our willingness and ability to understand that that's fucked up and change our attitudes about mistakes takes more growth.
For some people it's easier to dig in their heels and double down.
They don't even guess. Guessing would imply them understanding what you're talking about. They only think about the language, not the concepts. It's the practical embodiment of the Chinese room thought experiment. They generate a response based on the symbols, but not the ideas the symbols represent.
I think these models struggle with this because they don't process text as individual characters, but rather as tokens that often contain parts of a word. So the model never sees the actual characters within a token, and can only infer the contents of a token from the training data itself if the training data contains more information about it. It can get it right, but this depends on how much it can infer from training data and context. It's probably a bit like trying to infer what an English word sounds like when you've only heard 10% of the dictionary spoken aloud and knowing what it sounds like isn't actually that important to you.
Ok, so, tokenization of the words is why I get that I have seen tech nerds get so excited about a system that allows for being able to come up with synonyms for words that were auto-generated that have a basic ability to sometimes be correct by looking at the words before and after it....
But it's such a shitty way to look up synonyms! Using the words on either side doesn't mean you found a synonym just that you found another word that might work and it still has to use the full horsepower of ridiculously overpowered system.
Or you could have a lookup table that just reads the frickin word and has alternate synonyms predefined and it was able to run in word 97.
It's ridiculous that we think this is better in any meaningful way instead of just wasteful development.
Yah, people don’t seem to get that LLM can not consider the meaning or logic of the answers they give. They’re just assembling bits of language in patterns that are likely to come next based on their training data.
The technology of LLMs is fundamentally incapable of considering choices or doing critical thinking. Maybe new types of models will be able to do that but those models don’t exist yet.
A grown man I work with, he's in his 50s, tells me he asks ChatGPT stuff all the time, and I can't for the life of me figure out why. It is a copycat designed to beat the Turing test. It is not a search engine or Wikipedia, it just gambles it can pass the Turing test after every prompt you give it.
Honestly though, with a bit of verification, chatgpt 4 gives waaaaaay better answers than any search engine. Like, it's how it was back when you'd just ask Google a plain-english question and it'd give you SOMETHING at least.
Again, verify everything it tells you, it's still prone to hallucinations, but it's a damn good first step.
People want functioning web searching back, but rather than address issues in the industry breaking an otherwise functional concept, they want a new fancy technology to make the problem go away.
It works well if you know what to use it for.
Ever had something you wanted to Google, but couldn't figure out the keywords?
Ever saw someone use a specific technique of something, which you could describe, but wouldn't be able to find unless someone on a forum asked the same question?
That's were chatgpt shines.
Also for code it's pretty sweet
But yeah, it's not a wiki or hard knowledge retriever, but it might help connect the dots
There are techniques to make these kinds of errors less common already today. For example, you can ask it to think through its answers step by step using first principals. If you and an LLM to do that it will write out the letters line by line which gives it enough context to correctly answer using the improved probability the context window gives it. You can even ask it to write programs to answer questions so it could write a quick script to do it programmatically.
The main reason you don't see AIs doing this today is that producing all that extra context is slow and expensive and it's unnecessary a lot of the time for most prompts. As the technology gets faster and cheaper and the use cases get more complex these techniques will be used more and more often.
While the technology does have fundamental flaws, that doesn't mean there aren't ways to work with those flaws to avoid the problems they have when using the raw output.
They didn't ask it to produce incorrect output, the prompts are not leading it to an incorrect answer. It does highlight an important limitation of LLMs which is that it doesn't think, it just produces words off of probability.
However it's wrong to think that just because it's limited that it's useless. It's important to understand the flaws so we can make them less common through how we use the tool.
For example, you can ask it to think everything through step by step. By producing a more detailed context window for itself it can reduce mistakes. In this case it could write out the letters with the count numbered and that would give it enough context to properly answer the question since it would have the numbers and letters together giving it more context. You could even tell it to write programs to assist itself and have it generate a letter counting program to count it accurately and produce the correct answer.
People can point out flaws in the technology all they want but smarter people are going to see the potential and figure out how to work around the flaws.
Yeah which is why I get so aggravated when someone says that prompt engineering is pointless or not a real skill. It's a rapidly evolving discipline with lots of active research.