TechTakes @awful.systems David Gerard @awful.systems 1mo ago

DeepSeek roundup: banned by governments, no guard rails, lied about its training costs

pivot-to-ai.com DeepSeek roundup: banned by governments, no guard rails, lied about its training costs

Of course DeepSeek lied about its training costs, as we strongly suspected. SemiAnalysis has been following DeepSeek for the past several months. High Flyer, DeepSeek’s owner, was buying Nvidia GPU…

Fuck AI @lemmy.world prototype_g2 @lemmy.ml 1mo ago

pivot-to-ai.com /2025/02/08/deepseek-roundup-banned-by-governments-no-guard-rails-lied-about-its-training-costs/

31 0

56 comments

Even if they greatly underreported costs and their services are banned: the models are out there, open source and way more efficient than anything Meta and OpenAI could produce.

So it's pretty obvious that the tech giants are burning money for mediocre output.
- you do know that you don’t have to be a pliant useful idiot like this, right? doing the free “open source” pr repetition (when it’s none of that)? shit’s more like shareware (if that at all - certainly doesn’t have the same spiritual roots as shareware. for them it’s some shit thrown over the wall to keep the rabble quiet)
  
  (it’d be nice if we could popularise something like how kernel will go “tainted”, but unfortunately the entire fucking llm field is so we’d need a stronger word)
  
  Look, I get your perspective, but zooming out there is a context that nobody's mentioning, and the thread deteriorated into name-calling instead of looking for insight.
  
  In theory, a training pass needs one readthrough of the input data, and we know of existing systems that achieve that, from well-trodden n-gram models to the wholly-hypothetical large Lempel-Ziv models. Viewed that way, most modern training methods are extremely wasteful: Transformers, Mamba, RWKV, etc. are trading time for space to try to make relatively small models, and it's an expensive tradeoff.
  
  From that perspective, we should expect somebody to eventually demonstrate that the Transformers paradigm sucks. Mamba and RWKV are good examples of modifying old ideas about RNNs to take advantage of GPUs, but are still stuck in the idea that having a GPU perform lots of gradient descent is good. If you want to critique something, critique the gradient worship!
  
  I swear, it's like whenever Chinese folks do anything the rest of the blogosphere goes into panic. I'm not going to insult anybody directly but I'm so fucking tired of mathlessness.
  
  Also, point of order: Meta open-sourced Llama so that their employees would stop using Bittorrent to leak it! Not to "keep the rabble quiet" but to appease their own developers.
  
  The model is MIT licensed.
  
  Of course you're free to go full Stallman, but that's an open source license.
- I’m very confused by this, I had the same discussion with my coworker. I understand what the benchmarks are saying about these models, but have any of y’all actually used deepseek? I’ve been running it since it came out and it hasn’t managed to solve a single problem yet (70b param model, I have downloaded the 600b param model but haven’t tested it yet). It essentially compares to gpt-3 for me, which only cost OpenAI like $4-9 million to train (can’t remember the exact number right now).
  
  I just do not see the “efficiency” here.
  
  what if none of it’s good, all of it’s fraud (especially the benchmarks), and having a favorite grifter in this fuckhead industry is just too precious
  
  i haven’t seen another reasoning model that’s open and works as well… it’s LLM base is for sure about GPT-3 levels (maybe a bit better?) but like the “o” in GPT-4o
  
  the “thinking” part definitely works for me - ask it to do maths for example, and it’s fascinating to see it break down the problem into simple steps and then solve each step
  
  The 70b model is a distilation of Llama3.3, that is to say it replicates the output of Llama3.3 while using the deepseekR1 architecture for better processing efficiency. So any criticism of the capability of the model is just criticism of Llama3.3 and not deepseekR1.
I'm sorry but this says nothing about how they lied about the training cost - nor does their citation. Their argument boils down to "that number doesn't include R&D and capital expenditures" but why would that need to be included - the $6m figure was based on the hourly rental costs of the hardware, not the cost to build a data center from scratch with the intention of burning it to the ground when you were done training.

It's like telling someone they didn't actually make $200 driving Uber on the side on a Friday night because they spent $20,000 on their car, but ignoring the fact that they had to buy the car either way to get to their 6 figure day job
- i think you're missing the point that "Deepseek was made for only $6M" has been the trending headline for the past while, with the specific point of comparison being the massive costs of developing ChatGPT, Copilot, Gemini, et al.
  
  to stretch your metaphor, it's like someone rolling up with their car, claiming it only costs $20 (unlike all the other cars that cost $20,000), when come to find out that number is just how much it costs to fill the gas tank up once
  
  Now im imagining GPUs being traded like old cars.
  
  slaps GPU This GPU? perfectly fine, second hand yes, but only used to train one model, by an old lady, will run the upcoming monster hunter wilds perfectly fine.
  
  DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.
  
  Emphasis mine. Deepseek was very upfront that this 6m was training only. No other company includes r&d and salaries when they report model training costs, because those aren't training costs
  
  No, it's not. OpenAI doesn't spend all that money on R&D, they spent majority of it on the actual training (hardware, electricity).
  
  And that's (supposedly) only $6M for Deepseek.
  
  So where is the lie?
banned from use by government employees in Australia

So is every other AI except copilot built into Microsoft products. Government employees can't use chatgpt directly. So this point is a bit disingenuous.
- They specificallly named this one, you don't have to make up reasons that somehow it doesn't count.
I'm sure the next AI will be the ethical, uncensored, environmentally sustainable one...
wait, 2021 was when crypto was still a thing vcs poured money into, so that might be yet another case of crypto to ai pivot
- Jesus you still think AI is comparable to crypto? What year are you in 2022?
  
  off you fuck
  
  mods, offer him a battle he has no chance of winning
  
  Both based on desperately trying to find an application for linear algebra accelerators that can generate VC-scale inflated financial valuations, so, yeah
Not that it should be a regular thing but it is fun to watch this post get swarmed by slopfan reply guys
- it’s turning out the most successful thing about deepseek was whatever they did to trick the worst fossbro reply guys you’ve ever met into going to bat for them
Is that the whale mini boss from Dive Man's stage in MegaMan 4?
- sure is!
I'm still just impressed you can teach whales communism
Pretty standard for AI -except for the first part

56 comments