Tokens are a big reason today's generative AI falls short | TechCrunch
Tokenization, the process by which many generative AI models make sense of data, is flawed in key ways.
Seems odd that english is the best surly something like German superior?
How big is the data set in German that the models can learn from, how big is the specific user base once set up and what language do the engineers building it speak?