Realistically what is the worst thing China is doing with your private data? Selling it? If you’re not a Chinese National, at least you don’t fall under their jurisdiction.
If you’re a U.S. citizen, with all the tech oligarchs cozying up to the current administration, I’d be a lot more concerned with Facebook/Twitter/Etc collecting your data.
As a US citizen, I prefer services that US consumer protections could apply to. (While we still have them, ahem.) I know that Chinese laws will not protect me from things a Chinese business does in China.
(What’s with the rude replies? Did I fail to notice what instance I’m on or something?)
but it’s a foreign actor so OOooooOOWwwwooOOOO sCaRrRey!
I love that people think this is a solid own. Lest we forget Hong Kong, or an impending hot war in Taiwan or building out extradition systems with an expanding network of countries to forcibly repatriate and torture dissidents and human rights lawyers.
You used to not have to explain why authoritarianism was bad.
Edit: I would love to know the Pro side of what happened in Hong Kong, or the forced extradition regime, since evidently I'm clearly in the wrong in thinking those were bad. What am I missing?
It used to not be necessary because democracies used to have moral authority but since the revelations of Manning and Snowden non-Americans see no difference between giving our data to the USA or to China or any other. We also know from the reaction to the war in Ukraine and Gaza that human rights claims are only sometimes used.
Anti terrorism is good, actually. I don't support people kicking seniors for speaking mandarin to try to bully a government into not prosecuting murderers in the mainland, which was the reason the protests happened (that and Washington money)
This "China's AI is taking your data and that's bad" is shockingly similar to "TikTok is taking your data and that's bad". Lots of US counterparts do the same thing, but I don't see (as much) media coverage about that.
Don Draper: "no no no, everyone else's cigarettes are dangerous. Lucky Strikes are... toasted."
The way I think of it is, I don't live in China, so regardless of my objections to their values or human rights abuses, why would CCP or an affiliated company care about me or ruin my life on the basis of or by abusing my data? A big part of why I care about privacy is I don't want to be filtering my every thought through consideration of whether the powers that be would approve, and US companies are way more relevant to that.
This is probably only a problem with the online version. In contrast to google and openAI they, like meta, let you download the model and run it offline, where they can't access any of this data I presume.
Right, the offline version (if you have the hardware to run it) is completely under your control, and no one can take that away from you. Honestly nice to see that happen, I thought it would take several years.
Anyone using DeepSeek as a service the same way proprietary LLMs like ChatGPT are used is missing the point. The game-changer isn’t that a Chinese company like DeepSeek can compete with OpenAI and its ilk—it’s that, thanks to DeepSeek, any organization with a few million dollars to train and host their own model can now compete with OpenAI.
I'd like to look into that, how can I train an existing model further?
I'm only playing around with ollama, but like to do a bit more - mostly just to fulfill my needs to understand things - but have no idea where to start
"We store the information we collect in secure servers located in the People's Republic of China"
Now you Americans know how we Europeans feel when Google, Amazon and Facebook store our information on American servers. Hint: The protective wall between Chinese servers and their government are about as good as the one between American servers and their government - at least for non-US citizens. The last thin veil of privacy for Eurpeans has been ripped to shreds by Trump last week.
He killed the EU-US Data Privacy Framework. Theoretically, no company is allowed to transfer data of European citizens to US-based servers anymore. Sadly, Ursula von der Leyen is lacking the balls to act on this.
Thanks, managed to have it installed locally bia pocket pal (termux was giving me errors constantly on compile).
Out of curiosity, I made a very "interesting" prompt, and frankly I am not even surprised
EDIT: decided to be a little spicier, didn't fail to amuse me
We collect certain device and network connection information when you access the Service. This information includes your device model, operating system, keystroke patterns or rhythms, IP address, and system language. We also collect service-related, diagnostic, and performance information, including crash reports and performance logs. We automatically assign you a device ID and user ID. Where you log-in from multiple devices, we use information such as your device ID and user ID to identify your activity across devices to give you a seamless log-in experience and for security purposes.
It looks to me that they are using it to identify the user uniquely, maybe also related to captcha to prevent bots (it's common practice to capture mouse and keyboard while resolving captchas to see if the movement is human-like).
Maybe. They could also be doing things like paying attention to input cadence and typos/pre-send typo corrections to use as part of a fingerprint associated with the identifying information a user gives them when creating an account so that they can then attempt to detect the user elsewhere on the web whether they are using an identifying account or not.
Not usually. Keystroke info is different than text input, like if you didn’t click onto any field and typed it would only be captured if keystroke are all being grabbed. It’s especially scary if you keep the app running in the bg and then type something and it still captures it. Not saying they’re doing that, but the privacy policy says they might.
The rhythm part is annoying, it’s commonly used to ID people even through things like ad blocks and dns blocks. Could also (in theory) be used to capture what people are typing just by hearing how they type.
Our data's just too valuable for these parasites. Data privacy laws may eventually pass to compel software companies to store everything in US servers only.
Excellent Point. If that's the case though, then wouldn't other countries follow suit which still limits big tech's reach and makes them less profitable and less powerful? Idk. Guess we'll see how it plays out. Either way, I'm staying as far from those ecosystems as possible to at least try to mitigate some of what they do. I'll never be totally successful, genie is put of the bottle, but we can at least attempt.
This article is what US propaganda looks like folks. Mashable should be ashamed.
Literally all AI companies do this to run their services. Except you can actually download Deepseek and run it completely securely on your own devices.
You know who doesn't allow that security? OpenAI and the other US companies currently being screwed.
every google site has been doing this for years too. every comment we write in youtube and discard before posting, its being recorded. this isnt news at all.
Same as Chrome's magic bar, or android keyboard no ? So in the end, does USA doing it good because "democracy" (never ever with napalm) when China is bad because human rights violation (USA never did anything like this) ?
Seriously this. Nothing that China is accused of doing is any worse than what i know America has done. If it's the Chinese Communist Party stealing your data at least you know it won't be used to inject ads everywhere you go on the internet
At least they're transparent about it, unlike american companies that hide behind convoluted terms of services and then sell the data behind your back but it's technically legal.
China's like "yeah we collect everything". I can appreciate the honesty.
Not excusing Chinese companies but everyone does the same shit. I bet a lot of US companies that behave the same or worse will be looking for trade barriers to protect their business so their interests will be stoking fear of Chinese competitors. I don't really give a shit which country is doing it, I am not buying what they are selling.
US companies have a stranglehold on government, education and business and are getting access to my families data despite my personal objections. Far more concerned about that than a Chinese service I have no intention of using.
Deepseek can at least be self hosted if you want AI in your life. I can happily live without it.
the company states that it may share user information to "comply with applicable law, legal process, or government requests.
Literally every company's privacy policy here in the US basically just says that too.
Not only does DeepSeek collect "text or audio input, prompt, uploaded files, feedback, chat history, or other content that [the user] provide[s] to our model and Services," but it also collects information from your device, including "device model, operating system, keystroke patterns or rhythms, IP address, and system language."
Breaking news, company with chatbot you send messages to uses and stores the messages you send, and also does what practically every other app does for demographic statistics gathering and optimizations.
Companies with AI models like Google, Meta, and OpenAI collect similar troves of information, but their privacy policies do not mention collecting keystrokes. There's also the added issue that DeepSeek sends your user data straight to Chinese servers.
They didn't use the word keystrokes, therefore they don't collect them? Of course they collect keystrokes, how else would you type anything into these apps?
In DeepSeek's privacy policy, there's no mention of the security of its servers. There's nothing about whether data is encrypted, either stored or in transmission, and zero information about safeguards to prevent unauthorized access.
This is the only thing that seems disturbing to me, compared to what we'd like to expect based on the context of what DeepSeek is. Of course, this was proven recently in practice to be terrible policy, so I assume they might shore up their defenses a bit.
All the articles that talk about this as if it's some big revelation just boil down to "company does exactly what every other big tech company does in America, except in China"
Collecting keystrokes is very different from collecting text inputted into fields. Keystroke rhythms is even more alarming as that is often used to identify users despite them using privacy settings, or used to collect what’s typed via audio collection.
Your argument that this is no different than other apps is complete crap. Don’t trust any app that collects that information
Yes, not ALL other apps do that, but the comment was specifically talking about companies like Google and Meta... they definitely do collect incomplete strings from search forms (down to individual characters) when they display search suggestions, for example. They might not mention "keystrokes" in the legal text, but I don't see why they wouldn't be able to extrapolate your typing pattern since they do have the timing information which should be enough data to, at some level, profile it.
Nothing alleged about it. The main app wraps your prompt in a China-friendly one - at this point, I think people have mined the prompt itself? Scummy, sure, but it's also the same way that literally every other online AI service works.
Nothing alleged about it. The main app wraps your prompt in a China-friendly one
I asked it about whether the takeover of Hong Kong was met with international criticism. First I saw an answer saying yes, and a few paragraphs of examples and elaborations.
A few minutes later the answer I already saw was replaced with "sorry, that's outside of my scope." I think with the flood of new traffic to Deepseek, they are scaling up reviews of chat content.
Any ChatAI logs your keystrokes and your inputs to work and update their LLM. The PP and TOS is the same and even better as those from the US competitors. DeepSeek is OpenSource
Anyway I prefer Andisearch and its PP, the best of all these big tech AIs.
Hugging Face head of research Leandro von Werra and several company engineers have launched Open-R1, a project that seeks to build a duplicate of R1 and open source all of its components, including the data used to train it.
The engineers said they were compelled to act by DeepSeek’s “black box” release philosophy. Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.
I swear people do not understand how the internet works.
Anything you use on a remote server is going to be seen to some degree. They may or may not keep track of you, but you can't be surprised if they are. If you run the model locally, there is no indication it is sending anything anywhere. It runs using the same open source LLM tools that run all the other models you can run locally.
This is very much like someone doing surprised pikachu when they find out that facebook saves all the photos they upload to facebook or that gmail can read your email.
Yeah, uh... If you think that American companies aren't doing this same thing and handing your data over to the government without a warrant among other bad uses, I have some bad news for you. This is pretty much par for the course, and I'm pretty sure that we're witnessing a well financed negative media blitz happening to try and keep OpenAI from getting all of its spaghetti spilled. Watch for the government to try and ban deepseek for "national security" reasons soon.
And why is that an issue? It's typing data sent to a language model. What nefarious info might they be looking for? Learning to imitate humans? Fingerprinting? Making the best virtual keyboard asmr?
If you've got nothing to hide you don't have to worry ?
edit : For clarification, i consider "If you've got nothing to hide you don't have to worry" to be a naive argument, at best, in any privacy conversation, but I'm not averse to a well-reasoned argument to the contrary.
The wording here was unclear, what i mean to ask was:
"do you believe If you've got nothing to hide you don't have to worry ?"
Building my entire data model around the Tienanmen Square copypasta. I can run this thing on a Raspberry Pi plugged into a particularly starchy potato and it reliably returns the only answer I've thought to ask it.
detective conan sure had a hard time cracking the case!
"The personal information we collect from you may be stored on a server located outside of the country where you live. We store the information we collect in secure servers located in the People's Republic of China," the privacy policy reads.
Oh the horror! Let's look at what our glorious spawns-of-techbro heroism has for us in store:
OpenAI processes your Personal Data for the purposes described in this Privacy Policy on servers located in various jurisdictions, including processing and storing your Personal Data in our facilities and servers in the United States. While data protection law varies by country, we apply the protections described in this policy to your Personal Data regardless of where it is processed, and only transfer that data pursuant to legally valid transfer mechanisms.
When you access our website or Services, your personal data may be transferred to our servers in the US, or to other countries outside the European Economic Area (“EEA”) and the UK. This may be a direct provision of your personal data to us, or a transfer that we or a third party make.
So not only is your data "possibly" stored in one country, now there's a possibility of it being stored in many different countries. Where's the outcry for that?
Ok, so maybe your data being under the jurisdiction of another country is sus, right?
In another section about how DeepSeek shares user data, the company states that it may share user information to "comply with applicable law, legal process, or government requests."
OH MY GOD SOUND THE ALARM!
ChatGPT:
spoiler
We may use Personal Data for the following purposes: [...] To comply with legal obligations and to protect the rights, privacy, safety, or property of our users, OpenAI, or third parties.
Claude:
spoiler
Pursuant to regulatory or legal requirements, safety, rights of others, and to enforce our rights or our terms. We may disclose personal data to governmental regulatory authorities as required by law, including for legal, tax or accounting purposes, in response to their requests for such information or to assist in investigations. We may also disclose personal data to third parties in connection with claims, disputes or litigation, when otherwise permitted or required by law, or if we determine its disclosure is necessary to protect the health and safety of you or any other person, to protect against fraud or credit risk, to enforce our legal rights or the legal rights of others, to enforce contractual commitments that you have made, or as otherwise permitted or required by applicable law.
So not only can your data be subject to the authorities, but it's also handed out to 3rd parties (mind you, DeepSeek does the exact same, so why is it any surprise?).
Not only does DeepSeek collect "text or audio input, prompt, uploaded files, feedback, chat history, or other content that [the user] provide[s] to our model and Services," ...
🤦... You get the idea now, bother yourself with the privacy policies of the respective contemporaries and CTRL + F to "User Content" or "User Input".. Same fucking shit.
Companies with AI models like Google, Meta, and OpenAI collect similar troves of information, but their privacy policies do not mention collecting keystrokes.
Yes, collecting keystrokes is probably the oddest thing here. To compare data farming giants with a decade and a half's worth of data collection to a startup in terms of data collection is so astronomically dumb.
I could go on but I'm bored now. Do your own research.
Not quite on topic but semi related... It's reasons like this that I started reading privacy policies many times before signing up for a service.
People would be surprised at some of the extremely concerning things are listed in there. Some is for good reason but some stuff is absolutely unnecessary and should be an issue for some people.
When I read DeepSeek's privacy policy, I was creeped out by the invasiveness of the keystrokes thing. Then I realised that ChatGPT is just as creepy, but less upfront about it, and DeepSeek's relative transparencyn caused me to see them in a more favourable light
It doesn't. They run using stuff like Ollama or other LLM tools, all of the hobbyist ones are open source. All the model is is the inputs, node weights and connections, and outputs.
LLMs, or neural nets at large, are kind of a "black box" but there's no actual code that gets executed from the model when you run them, it's just processed by the host software based on the rules for how these work. The "black box" part is mostly because these are so complex we don't actually know exactly what it is doing or how it output answers, only that it works to a degree. It's a digital representation of analog brains.
People have also been doing a ton of hacking at it, retraining, and other modifications that would show anything like that if it could exist.
They should store the data in US servers like OpenAI does. Apparently then Mashable won't write an article about it.
The criticism thrown at DeepSeek in the past days is just as applicable to American AI models. But when that was brought up it in the past it was "making things political".
This article is about the app, which does not run the model locally. Why would you doubt that a Chinese app which openly claims they send your data to China, actually does so?
It seems like a smear piece because it makes it sound like DeepSeek is doing something that the others aren't, while the truth is that ever single on of them collects your data.
At best, it's disingenuous. At worst, with the ability to run locally, it's a blatant lie.
Hugging Face head of research Leandro von Werra and several company engineers have launched Open-R1, a project that seeks to build a duplicate of R1 and open source all of its components, including the data used to train it.
The engineers said they were compelled to act by DeepSeek’s “black box” release philosophy. Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.
The runner is open source, and that's what matter in this discussion. If you host the model on your own servers, you can ensure that no corporation (american or Chinese) has access to your data. Access to the training code and data is irrelevant here.
DeepSeek's privacy policy raises concerns about a U.S. foreign adversary's ability to access U.S. user data. Users are familiar with the massive amounts of data U.S. tech companies collect, but China's cybersecurity laws make it much easier for the government to demand data from its tech companies. Additionally, DeepSeek users have reported instances of censorship, when it comes to criticizing the Chinese government or asking about Tiananmen Square.
Users have been shown that both governments are untrustworthy so what the fuck are we supposed to do?
Am I supposed to not read this article as panic? I know this is Mashable but the media overall is no longer unbiased and now there’s gonna be more gremlins to watch for in pro-US corpo AI propaganda and media ownership having stakes in AI.
Ok, so they'll ban it under that guise to appease US companys, same as TikTok. I really didn't care about TikTok since it's all brain rot to me but this might actually be a tool I'll use if it's as efficient as they say.
no sh*t! now tell me, not that it's correct, but what does the chinese intelligence apparatus can do to me vs. what the u.s. intelligence apparatus (which has been collecting intelligence about me since i'm alive) can do to me?
They both can and frequently do influence the information you are exposed to on social media to influence your decision making. Not you specifically, unless you someone very important, but your demographic in a broader sence. The more data they have on you, the more effective this process is.
and that's what superpowers do, but living in a third world country i'm yet to see the chinese putsch us as the u.s. did during the cold war and beyond, with all due consequences. sorry about my lack of goodwill towards the department of state.
Assuming that DeepSeek really is logging keystrokes (they provided no evidence: who were they quoting?), that is unfortunately not uncommon. As shown by their TikTok pearl clutching, corporate media regularly goes for maximalist cold war fearmongering.
I've noticed that this "there is no proof!" or "where's the evidence?" all of a sudden has become popular. You have people saying it even when they're talking about a very specific statement of a fact that's very specifically and easily verifiable.
that is unfortunately not uncommon
Completely true. A lot of web sites monitor everything you do on them, and can play it back for anyone who's curious about optimizing the UX or for any other less innocent reason. Generally I think there's not much specific in their privacy policy about it when they do. It's not surprising that this one is also doing that, accompanied by really a pretty minor line in their privacy policy to go along with it, I completely agree with you here.
As shown by their TikTok pearl clutching, corporate media regularly goes for maximalist cold war fearmongering.
Personally, I wish the corporate media would pearl-clutch a little bit more about how explicitly malicious to our interests our computing devices have become. "Everyone does it, so it's not a big deal after all" is a common take to have, but it's the exact opposite of the one that I personally have on it.
Everyone does it, so it’s not a big deal after all
...and I think that's you completely misreading what people are saying.
We're saying that it's bunk for the corporate media to portray it as this dangerous thing when they refuse to report similarly on US companies doing the same with the same ferocity.
I think most people agree with you, that our privacy protections are fucking abysmal and no company should be being allowed to do this stuff. Hell, that's like the entire thrust of Ed Zitron's entire fucking blog: that none of these companies should get away with this.
It's like when Facebook got fined a paltry sum for being caught lying about their video metrics and literally putting businesses like CollegeHumor out of business because they "pivoted to facebook video" to grab those high metrics... which never materialized because Facebook was ratfucking lying to people. They should have been shut down and put out of business for that, not fined less than they made ripping off people.
People are sick of the companies here getting a pass, and the media gives them a pass. It's more that you can't make freaked out headlines like this about TikTok and DeepSeek and not understand that everyone is rolling their fucking eyes because we're all like "it's no worse than what US companies already do to us." That doesn't mean we like it or are okay with it. It means we're rolling our eyes at a fucking insipid news media that's obviously lying to us for the sake of private American companies profit, not because they care about rightfully informing American citizentry about what is happening.
All of us fucking hate it, but what the fuck do you expect us as individuals to do about it? Folks like me have been voting Blue for 25 fucking years with fuck-all to show for it on issues like these. So why's it our job to explain that we don't support it, we just think it's dumb as fuck when a foreign company is doing the same thing and now suddenly that's evil, but our guys doing it is somehow fine. What we have issue with is the hypocrisy.
Yes. I also like how the alarming take on it is not "People are typing their passwords / medical histories / employer's source code into ChatGPT and from there it goes straight into the training data not only to be stored forever in the corpus, but also sometimes, to be extracted at a later date by any yahoo who knows the way to tease it back out from ChatGPT via the right carefully crafted prompting!"
But instead it is "When you type things, they can see what you type! The keystrokes!"
Isn't it open source? If so it should be near trivial to get rid of all of that.
If it's closed source I wouldn't touch it with a tej foot pole, it's the same reason I rarely use chat gpt, it's just freely giving away your personal data to open AI.
I feel like their is more common. I do deliberately say its for companies because companies aren't people and don't deserve people pronouns. Countries seem more like a collection of people, so I use their.
If someone knows more about grammar feel free to correct me.
No I’m not surprised at all. This is necessary for any kind of auto save and auto complete. Not happy about my shit being stored in China, but “collects every keystroke” isn’t really news anymore.
If you’re worried about this kind of behavior, don’t use any website with auto save or auto complete, period.
Not in the way you think. They aren't constantly training when interacting, that would be way more inefficient than what US AI companies have been doing.
It might be added to the training data, but a lot of training data now is apparently synthetic and generated by other models because while you might get garbage, it gives more control over the type of data and shape it takes, which makes it more efficient to train for specific domains.
It's not really trust. It's more like hate. I hate this country. I hate what it's turned into. I hate the oligarchy. I hate the captured politicians. I hate that the supposed opposition to actual fascists is a bunch of whinging cunts who's primary concerns are about rules and norms. I hate that we are everything our propaganda told us about the rest of the world. I hate that our entire economic system is just a jenga stack of pyramid schemes.
We are a failed state. And if what I type on my phone helps our geopolitical "adversaries" get even 1 human hairs worth of advantage on this shithole then I'll give it gladly.
I want them to win. And they will. Short of global nuclear war they can't lose. Venture capital can't even dream of competing with central planning. Not even a question.