I have nothing against the people that are working on AI and appreciate the work they do. However every time I see an article about a company using AI like this I just get the vibe that it's a bunch of middle aged men trying desperately to make things like the "future" they saw when they were a kid. I've seen amazing implementations of AI in a lot of different ways but I'm so sick of dumb ideas like this because some guy that used to watch Star Trek as a kid wants to feel like they live in the future while piggybacking on someone else's work. It's like the painted tunnel in cartoons where it looks like a real tunnel but in reality it's just a very convincing lie. And that's all that it is. Complexity does not mean sophistication when it comes to AI and never has and to treat it as such is just a forceful way to make your ideas come true without putting in the real effort.
Sorry, I had to get that out. Also I have nothing against Star Trek and I used to watch it as a kid because my parents watched it all the time.
some guy that used to watch Star Trek as a kid wants to feel like they live in the future while piggybacking on someone else’s work.
I don't think they care about their own nostalgia. I think they ant to use other people's dreams to make a lot of money. I'm also sure some of them genuinely just ant to push the technological envelope just cause they can, ethics be damned. But ultimately, it's just money.
I would love nothing more than the utopian future Trek promised but greed is killing it.
Complexity does not mean sophistication when it comes to AI and never has and to treat it as such is just a forceful way to make your ideas come true without putting in the real effort.
It's a bit off-topic, but what I really want is a language model that assigns semantic values to the tokens, and handles those values instead of directly working with the tokens themselves. That would be probably far less complex than current state-of-art LLMs, but way more sophisticated, and require far less data for "training".
I'm not sure I understand. Do you mean hearing codewords triggering actions as opposed to trying to understand the users intent through language? Or is are there a few more layers to this whole thing than my moderate nerd cred will allow me to understand?
Not quite. I'm focusing on chatbots like Bard, ChatGPT and the likes, and their technology (LLM, or large language model).
At the core those LLMs work like this: they pick words, split them into "tokens", and then perform a few operations on those tokens, across multiple layers. But at the end of the day they still work with the words themselves, not with the meaning being encoded by those words.
What I want is an LLM that assigns multiple meanings for those words, and performs the operations above on the meaning itself. In other words the LLM would actually understand you, not just chain words.
Semantic embeddings are a thing. LLMs "work with tokens" but they associate them with semantic models internally. You can externalize it via semantic embeddings so that the same semantic models can be shared between LLMs.
The source that I've linked mentions semantic embedding; so does further literature on the internet. However, the operations are still being performed with the vectors resulting from the tokens themselves, with said embedding playing a secondary role.
This is evident for example through excerpts like
The token embeddings map a token ID to a fixed-size vector with some semantic meaning of the tokens. These brings some interesting properties: similar tokens will have a similar embedding (in other words, calculating the cosine similarity between two embeddings will give us a good idea of how similar the tokens are).
Emphasis mine. A similar conclusion (that the LLM is still handling the tokens, not their meaning) can be reached by analysing the hallucinations that your typical LLM bot outputs, and asking why that hallu is there.
What I'm proposing is deeper than that. It's to use the input tokens (i.e. morphemes) only to retrieve the sememes (units of meaning; further info here) that they're conveying, then discard the tokens themselves, and perform the operations solely on the sememes. Then for the output you translate the sememes obtained by the transformer into morphemes=tokens again.
I believe that this would have two big benefits:
The amount of data necessary to "train" the LLM will decrease. Perhaps by orders of magnitude.
A major type of hallucination will go away: self-contradiction (for example: states that A exists, then that A doesn't exist).
And it might be an additional layer, but the whole approach is considerably simpler than what's being done currently - pretending that the tokens themselves have some intrinsic value, then playing whack-a-mole with situations where the token and the contextually assigned value (by the human using the LLM) differ.
[This could even go deeper, handling a pragmatic layer beyond the tokens/morphemes and the units of meaning/sememes. It would be closer to what @[email protected] understood from my other comment, as it would then deal with the intent of the utterance.]