Lemmy.ca Support / Questions @lemmy.ca BuoyantCitrus @lemmy.ca 1y ago

Privacy / data retention policy

It'd be nice to (eventually!) see a link laying out a privacy policy for the instance, something like: https://newsie.social/privacy-policy

I'd especially be interested to know how long you associate the IP addresses we visit from with our accounts, who can see that info (and our emails), what other PII you store, and how long deleted posts/accounts are stored for.

(Totally get and very much appreciate that smorks &co have a lot on their plates just getting this place off the ground, not trying to demand additional work, just a suggestion. Seems like it'd take some thinking to balance with eg. a good backup regimen.)

You're viewing a single thread.

27 comments

i would like to post a privacy policy when I have some free time.

as for your IP, i am anonymizing the ip stored in the nginx logs, basically just storing .0 for the last digit if the ip address, adapted from here: https://www.supertechcrew.com/anonymizing-logs-nginx-apache/

that used to be in one of the lemmy recommended nginx configs, but it appears to have been removed. but it's still being used on this instance, and I should double-check to make sure it also works for IPv6 addresses.

i also don't know if the IP is stored in the DB as well, i'd have to look through the lemmy code/tables to see if that's the case.

i'm currently the only one with access to the server itself, so it would only be me who has access to that info.

i don't think there's any other PII that is stored?

As for how long deleted posts/comments/accounts are stored for? i'm not entirely sure. i know that typically deleted posts are just flagged in the db as "deleted", but i think it also changes the content to deleted by creator or something, and i don't believe there's any way to get the original post/comment back? there's also a way for admin's to purge users/communities/posts/comments, which deletes them from the database, but i don't think there's anything that does that automatically after a certain time period.

I know i have a lot of "i think"'s in there, so all of this is a best guess. I'll do some digging and testing at some point so I can firmly answer these questions.
- Thanks! That's everything I'd hoped to get rolling and then some so clearly you have it well in hand.
  
  Really what I was aiming for was a recognition that we shouldn't have to guess about this stuff and it should be straightforwardly laid out. Of course that's rarely the case but I see Lemmy as a way to collaborate in building the sort of social media we want rather than what we're given as a byproduct of other's interests.
  
  Deletion on other instances isn't something we can control but we can point that out so people understand. And for our part we can understand what's happening on our systems, ensure it's in line with what we want (eg. if it isn't expunged you can add a cron job to do it after X days or w/e) and be transparent about it.
  
  I didn't expect that you'd go as far as not logging exact IPs at even the the HTTP level, I fear that you will have to walk that back a bit over time in order to use things like fail2ban and more sophisticated tools to quickly respond to abuse and DoS attempts. Alas time and time again has proven there are some people out there who just like to mess with stuff and we need to be proactively resilient against what's unfortunately inevitable. Similarly, there'll be more subtle stuff like it becomes obvious that some set of IPs has been used in mass creation of accounts for sockpuppets or LLM bots and it'd be useful to retain them for a bit so we'd have the option of going back and reviewing what they put out.
  
  i don’t think there’s any other PII that is stored?
  
  We have the option to give your our emails too, is that only visible to you?
  
  I know i have a lot of “i think”'s in there, so all of this is a best guess. I’ll do some digging and testing at some point so I can firmly answer these questions.
  
  See? Right person for the job. Holler if you need anything, I get a general sense there's a willingness to pitch in around here.
  
  We have the option to give your our emails too, is that only visible to you?
  
  yes, that's only visible to me. 99% sure that it doesn't leave this instance.
  
  See? Right person for the job. Holler if you need anything, I get a general sense there’s a willingness to pitch in around here.
  
  agreed. everyone has been super supportive and helpful so far. i will let you know what i find, and will reach out if i need help with anything. thanks!
  
  Thanks @smorks; you’re doing an awesome job! One thing not covered yet is your backup policy; it’s possible that items would get backed up before they were deleted.
  
  Personally for my server box I have two local backups and one offsite rotating backup that rotates quarterly. Such a setup can definitely capture information that later changes such as deleted posts and IP addresses that are logged before they’re rewritten. Something to consider, especially as backups are important for disaster recovery, especially in the case of handling other people’s data.
  
  thank you for the reminder. i'm currently doing weekly backups offsite, which includes the database and all pictrs data (image data). the off-site backups are all encrypted (i'm currently using restic for this), so again, i'm currently the only one with access to it.
  
  Ugh, yeah. I've been torn about ringing alarm bells to local admins about this as it is a thankless job for them and I'd hate to start scaring the core group facilitating adoption, but the other side is if this isn't resolved soon Lemmy is going to find a lot of instances having a bad time and disappearing.
  
  As mentioned in another comment, this really has to start as a policy framework adopted by the larger Lemmy community and modified to suit the conditions of the local. As you've highlighted, the whole federated / control of posts is not one that is easily grasped by the end user (or some admins elsewhere I've found). The argument that "everything you put on the internet is there forever" doesn't address that there is a huge distinction between a capture and a federated, distributed and indexed copy.
  
  I grew up in the wild-wild-west of the early internet and have made an informed decision on how to engage on this platform. It is very evident from the discussions I've seen across the Lemmy-verse that most are completely unaware.
  
  Although I often consult on such things, I am not a lawyer and hesitate to get too involved myself. However I too am available to sound ideas.
  
  Here are the current Lemmy issues I've found on the subject, if anyone has the capacity and desire to contribute to this issue, I'd start here. https://github.com/LemmyNet/lemmy/issues/721 https://github.com/LemmyNet/lemmy-ui/issues/1347
  
  I couldn't help myself.
  
  I adapted the Mastodon privacy policy, which was adapted from Discourse.
  
  https://github.com/BanzooIO/federated_policies_and_tos/blob/main/lemmy-privacy-policy.md
  
  If anyone is interested in providing input or questions, I've started discussions here: https://lemmy.ca/post/821266 (hosted on lemmy.ml)
  
  @[email protected] it wasn't letting me tag you in the last comment for some reason. As you expressed interest in helping, feel free to review and join the discussion linked above.
  
  Didn't have time to get to that today but I will take a look (maybe this weekend?) and I appreciate the initiative! Though, I also kinda think it's jumping the gun a bit since there first needs to be understanding as to what the situation is before we can describe it and I did think we should start by describing where we are before trying to change much. But maybe not, gotta start somewhere I suppose and sounds like you've got something concrete here...
  
  I agree. When you take a look you'll see a lot of disclaimer on my part regarding use of the policy.
  
  I offered it as a starting point to our local, but already suggested he waits for the wider input I hope to get Lemmy wide as I am pushing this as an issue hardcore.
  
  @smorks has demonstrated a high level of compentcy and care in my books and personally I couldn't care less if he published one or not as a result. But for his safety, and the wider Lemmy community, this has to be addressed. For instance, some admins are simply flat out blocking EU incoming connections to mitigate not having the required policies published.
  
  Also cognisant how misunderstood federation is to the mass number of non-technical newcomers, and how terrifying the policy may seem on first read, I have drafted this policy primer an admin could potentially use to express a clear distinction on what their responsibilities are and in what ways it is the users responsibility. With proper education and care on behalf of the user, this could be a much safer platform than almost any other out there.
  
  https://github.com/BanzooIO/federated_policies_and_tos/blob/main/optional-privacy-policy-intro.md
  
  Not pushing, or even petitioning our local to adopt any of it, just putting it out there for reference.
  
  Read it over and want to thank you for taking this on, it's a good start covering most / all of what one would expect to be laid out in something like this. Tedious but well worth doing! I do think for the official policy it might well be worth the community crowdfunding a lawyer who has solid experience in such things, maybe EFF can help and we can do a sort of donation drive for them or similar?
  
  Don't have tonnes of specific advice but a few things stood out:
  
  Retain the IP addresses associated with registered users no more than 12 months.
  
  Seems pretty long and I know this is a template so I imagine smorks will aim for much less given that he even makes an attempt to anonymise nginx logs. I think we might want to keep the template lower too just to nudge people in the right direction?
  
  You also understand that although there are controls to prevent the distribution of your email and IP address, due to the nature of federated services, all of your engagement on this platform should be considered public.
  
  I think this is a key point but the "although" calls the security of ip/email into question and seems to potentially lump it in with the other stuff. Maybe split them out somehow?
  
  {{your_instance_name}} makes every effort to secure your email and IP address and limit access to them. Due to the federated nature of this platform, we cannot provide similar guarantees for your direct messages as they are exposed to other instances outside of our control so it is best to consider them potentially public along with any other interactions you make.
  
  Which does cause me to wonder: how is voting federated, do other instances see which users up/downvoted a comment from lemmy.ca or does lemmy.ca just provide vote totals for the instance?
  
  And I think a plain language add-on explainer thingy is great, the fediverse is a bit confusing. I found your draft a bit long and a bit, I dunno, overfamiliar? Not saying I could do better, it's just a hard thing to be conversational without being twee I suspect. Definitely respect your making the effort, it's a worthwhile contribution in its own right and lays out what's valuable and different about this space along with its limitations. Although it might be scope creep to include quite so much detail about how Alpha, Meta, etcetera. operate I like your concise explanation of "they're probably not listening because what they do with metadata is kinda more powerful". I often struggle with this as that "I know, it's like they listen!" is a common reaction that people have in support of my aversion to eg. installing Meta's apps on my phone.
  
  On second though, I wonder if this could be even more general and just a really polished version explaining the overall gist of the platform that instances can link to at joinlemmy.org. Like a section 1b "Why federated?" after https://join-lemmy.org/docs/index.html#introduction
  
  Thanks for taking the time to look it over! As I've expressed, this is really a Lemmy wide initiative, and as you've suggested, something that warrants a community fundraising effort to provide proper legal oversight.
  
  Seems pretty long and I know this is a template so I imagine smorks will aim for much less given that he even makes an attempt to anonymise nginx logs. I think we might want to keep the template lower too just to nudge people in the right direction?
  
  Meant to make this a admin supplied variable and have now updated. You've caught onto the spirit of what I am doing here though; it is not just intended as a document to inform users but to help admins navigate their responsibilities. That is why I have given the example of disclosure in what I see is a huge potential issue with the PostgreSQL SSL support. This will hopefully make a potential inexperienced admin take pause when their server is being tach'd out and the decide to host the DB outside of the local host without a proper mitigating strategy (and I have seen this happen before with very experienced admins in a commercial setting).
  
  I think this is a key point but the “although” calls the security of ip/email into question and seems to potentially lump it in with the other stuff. Maybe split them out somehow?
  
  Agreed. I kind of see how in one hand I'm saying that component is secure while also saying it isn't without making the distinction between the user submitted public data and the traceable data that is being protected. I'll figure out a better way to partition that.
  
  Which does cause me to wonder: how is voting federated, do other instances see which users up/downvoted a comment from lemmy.ca or does lemmy.ca just provide vote totals for the instance?
  
  This is my big concern here and why despite telling myself to stay out of it I got involved. A lot of people, very experienced people, and some admins, do not have a full picture of how this works yet. Your votes are entirely public, there is just the UI choice in Lemmy to not display them. On other interoperable platforms this data becomes public. When this comes up there is a chorus of people chiming in, "don't post anything you don't want public on the internet". There is a difference between potential scraped or captured copies and a copy that is distributed by design. There are two different goals: a monolith platform has a measure of control in how your engagement is made public while being completely open to being tracked. A federated system, by design, has limited control over how public your engagement is (and remains) but a high level of tracking protection. This maybe started out as a group of largely technical users that understands this distinction, but as adoption grows so does the risk of this distinction not being well understood.
  
  I found your draft a bit long and a bit, I dunno, overfamiliar?
  
  Yeah, I am going to work on a "lite" version eventually. It is not a simple task to educate in this domain where you have two distinct ideologies on the same subject.
  
  On second though, I wonder if this could be even more general and just a really polished version explaining the overall gist of the platform that instances can link to at joinlemmy.org. Like a section 1b “Why federated?” after https://join-lemmy.org/docs/index.html#introduction
  
  I am pushing this hardcore platform wide. I have confidence in our local admin and would like to see it protected here, but the scope of my goal has gone platform-wide.
  
  Thanks again for taking the time to provide input!
  
  pinging new admins here @[email protected] @[email protected] @[email protected]
  
  Know you're probably really busy, and this whole space is taking off fast, but this is really, really important to maintain your and your users's safety.
  
  See: https://lemmy.ca/post/948217
  
  Thanks for the tag, hadn't seen this. I agree that having a privacy policy is important, we'll chat and get back to you!
  
  I found your draft a bit long and a bit, I dunno, overfamiliar?
  
  I created a "lite" version. Tried getting GPT to summarize it and it failed horribly. Don't know if that says something good or bad about my writing or good or bad about the complicated nature of federation.
  
  https://github.com/BanzooIO/federated_policies_and_tos/blob/main/optional-privacy-policy-intro-lite.md
  
  Language still might not be for everyone, but hopefully gives admins a place to start in really making a clear distinction between their and the user's responsibility.

27 comments