The OP of that post did admit, to purposely using bots for that demonstration.
I am not making this post, specifically for that post. Rather- we need to collectively organize, and find a method.
Defederation is a nuke from orbit approach, which WILL cause more harm then good, over the long run.
Having admins proactively monitor their content and communities helps- as does enabling new user approvals, captchas, email verification, etc. But, this does not solve the problem.
The REAL problem
But, the real problem- The fediverse is so open, there is NOTHING stopping dedicated bot owners and spammers from...
Creating new instances for hosting bots, and then federating with other servers. (Everything can be fully automated to completely spin up a new instance, in UNDER 15 seconds)
Hiring kids in africa and india to create accounts for 2 cents an hour. NEWS POST 1POST TWO
Lemmy is EXTREMELY trusting. For example, go look at the stats for my instance online.... (lemmyonline.com) I can assure you, I don't have 30k users and 1.2 million comments.
There is no built-in "real-time" methods for admins via the UI to identify suspicious activity from their users, I am only able to fetch this data directly from the database. I don't think it is even exposed through the rest api.
What can happen if we don't identify a solution.
We know meta wants to infiltrate the fediverse. We know reddits wants the fediverse to fail.
If, a single user, with limited technical resources can manipulate that content, as was proven above-
What is going to happen when big-corpo wants to swing their fist around?
Edits
Removed most of the images containing instances. Some of those issues have already been taken care of. As well, I don't want to distract from the ACTUAL problem.
They were able to get back with me- and provided this comment:
Thank you - we increased our security and attempted to purge our bots three days ago - if further suspicious activity is detected, we want to hear about it.
Just wanted to point out that according to your stats, unless I don't understand them well, only 26 bots come from lemmy.world (which has open sign-ups, and uses the "easy to break" (/s) captcha) and 16 from lemmy.ml (which doesn't have open sign-ups and relies on manual approvals).
For some perspective, lemmy.world has almost 48k users right now. Speaking of "corrective action" is a bit of a stretch IMO.
I haven't spun up an instance, so I don't have a good idea what the DB looks like, but are IP addresses captured on either account signup and/or vote casting?
It's isn't a silver bullet, but it's prohibitively more expensive to spin up instances to cast votes for bot users versus running through a script on a single machine. If you've got an IP you might be able to pinpoint bot activity and the accounts associated with it (until they get smarter, at least)
last I checked, they use a single bot to repost communities from reddit. meaning that you can just block that single user and get rid of all the lemmit.online content that's in your feed.
The place feels different today than it did just a couple of days ago, and it positively reeks of bots.
I'm seeing far fewer original posts and far more links to karma-farmer quality pabulum, all of which pretty much instantly somehow get hundreds of upvotes.
Theres some that aren't just money.
There are bots that mirror content from Reddit, just linking to them.
I've seen posts that are 3 or 4 crossposts (between community/instances) deep.
I dont have much to add other than I am an experienced admin and was dismayed at how vulnerable Lemmy is. Having an option to have open registrations with no checks is not great. No serious platform would allow that.
I dont know of a bulletproof way to weed put the bad actors, but a voting system that Lemmy can leverage, with a minimum reputation in order to stay federated might work. This would require some changes that I'm not sure the devs can or would make. Without any protection in place, people will get frustrated and abandon Lemmy. I would.
When I made a post saying that 90% (now ~95%) of accounts on lemmy are bots the amount of people saying that there's no proof and/or saying to me that there's a lot of people joining from reddit right now was astonishing.
Edit: one person said me that noone would make 1.6mln bots when there are only 150k-200k users on the platform, like WTF.
Another thing is people are likely pre-creating bot accounts and then sitting in them in case additional protections are created...
The problem is, these accounts look to us just like any new user, lurking around getting a feel for the place - there's no way to distinguish them until they start this bots acting in some fashion
Honestly, I’m interested to see how the federation handles this problem. Thank you for all the attention you’re bringing to it.
My fear is that we might overcorrect by becoming too defederation-happy, which is a fear it seems that you share.
Since it seems like most of these bots are coming from established instances (rather than spoofing their own), I agree with you that the right approach seems to be for instance mods to maintain stricter signups (captcha, email verification, application, or other original methods). My hope is that federation will naturally lead to a “survival of the fittest” where more bot-ridden instances will copy the methods of the less bot-ridden instances.
I think an instance should only consider defederation if it’s already being plagued by bot interference from a particular instance. I don’t think defederation should be a pre-emptive action.
There is no built-in “real-time” methods for admins via the UI to identify suspicious activity from their users, I am only able to fetch this data directly from the database. I don’t think it is even exposed through the rest api.
The people doing the development seem to have zero concern that their all the major servers are crashing with nginx 500 errors on their front page under routine moderate loads, nothing close to a major website. There is no concern to alert operators of internal federation failures, etc.
I am only able to fetch this data directly from the database.
I too had to resort to this, and published an open source tool - primitive and non-elegant, to try and get something out there for server operators: [email protected]
I did update my post, shortly before you posted this, to include that- as well as- removing a lot of the data for individual instances as it derives from the point / problem I am trying to identify.
The data, however, is quite valuable in exposing that this WILL be a problem for us, especially if we do not identify a solution for it.
We need a better solution for this, rather then mass-bulk defederation.
In my opinion- that is going to greatly slowdown the spread and influence of this platform. Also IMO- I think these bots are purposely TRYING to get instances to defederate from each other.
Meta is pushing its "fediverse" thing. Reddit, is trying to squash the fediverse. Honestly, it makes perfect sense that we have bots trying to upvote the idea of getting instances to defederate each other.
Once- everything is defederated- lots of communities will start to fall apart.
I agree. This is why I started the Fediseer which makes it easy for any instance to be marked as safe through human review. If people cooperate on this, we can add all good instances, no matter how small, while spammers won't be able to easily spin up new instances and just spam.
The solution is to choose servers with admins who are enabling bot protections.
If admins are not using methods to dissuade bot signups, then they're not keeping their site clean for their users. They're being a bad admin.
If they're not protecting their site against bots, they're also not protecting the network against hosts. That makes them bad denizens of the Fediverse, and the rest of us should take action to protect the network.
And that means cutting ties with those who endanger it.
I think that one of the most difficult things to deal with more common bots, spamming, reposting, etc.
Is that parsing all the commentary and dealing with it on a service wide level is really hard to do, in terms of computing power and sheer volume of content. Seems to me that do this on an instance level with user numbers in the 10's of thousands is a heck of a lot more reasonable than doing it on a 10's of millions of users service.
What I'm getting at is that this really seems like something that could (maybe even should) be built into the instance moderation tools, at least some method of marking user activity as suspicious for further investigation by human admins/mods.
We're really operating on the assumption that people spinning up instances are acting in good faith, until they prove that they aren't, I think the first step is giving good faith actors the tools to moderate effectively, then worrying about bad faith admins.
Reposting this in comment from a reply elsewhere in the thread.
If anything there should be SOME centralization that allows other (known, somehow verified) instances to vote to disallow spammy instances from federating. In some way that couldn't be abused. This may lead to a fork down the road (think BTC vs BCH) due to community disagreements but I don't really see any other way this doesn't become an absolute spamfest. As it stands now one server admin could spamfest their own server with their own spam, and once it starts federating EVERYONE gets flooded. This also easily creates a DoS of the system.
Asking instance admins to require CAPTCHA or whatever to defeat spam doesn't work when the instance admins are the ones creating spam servers to spam the federation.
I really hope that some researchers will get interested into this and develop some cool solutions to this. Maybe we are lucky and they even implement them into Lemmy.
That feel pretty much the only way you can easily filter bot out.
The best ID to check would be a government ID or a bank account ID. The gov/bank are absolutely crazy about making sure that someone is really someone.
Unfortunately, this is incompatible with anonymity, unless we trust the instance admin.
I really wish we would have a good data scientist, or ML individual jump in this thread.
I can easily dig through data, I can easily dig through code- but, someone who could perform intelligent anomaly detection would be a god-send right now.
There are data scientist around and we are monitoring where this goes.
Bigest problem I currently see is how to effectively share data but preserve privacy. Can this be solved without sharing emails and ip addresses or would that be necessary? Maybe securely hashing emails and ip addresses is enough, but that would hide some important data.
Should that be shared only with trusted users?
Can we create dataset where humans would identify bots and than share with larger community (like kaggle), to help us with ideas.
There are options and will be built, just jt can not happen in few days. People are working non stop to fix (currently) more important issues.
Be patient, collect the data and let's work on solution.
And let's be nice to each others, we all have similar goals here.
If you always had e-mail verification turned on then you can get rid of some of these junk sign-ups relatively easy, I wrote a guide for it here: https://lemdit.com/post/16430
From what I've seen, most of the bot sign-ups that are swelling instance User numbers wouldn't have passed e-mail verification. I think it was done mostly to prove a point, rather than an attempt to actually use those accounts.
Instances that didn't have e-mail verification turned on are in a much harder spot.
Browser fingerprints are easy enough to block or mimic, though, at least for the solutions I've messed with. JavaScript based solutions in particular are tricky because of the privacy implications and the fact that decent privacy focused browsers are starting to block those things automatically.