Instance Meta - Is this instance getting flooded with spam bot accounts?
So, lemmy seems to be flooded with spam bot accounts at the moment. Look through the table of servers on fedidb (https://fedidb.org/software/lemmy) and notice how there are these huge instances without any active users (MAU).
Also notice how startrek.website has 9000 users for 276 active users this month.
From memory, when I signed up, there was no email requirement or captcha or anything.
Just a member here, but given the recent rapid growth in the past 10 days as folks migrate over from a 600k subreddit (myself included), and the normal 90% lurker rule-of-thumb, this is actually a fairly reasonable monthly active user ratio.
I genuinely understand the concern to avoid bots and trolls, but have admins in other instances actually documented a significant number of bots originating through this instance?
Thank you for your care of this place, and due diligence.
With the instance growing so quickly, aggregate stats can flag but are vulnerable to aliasing.
I very much appreciate the path to lemmy membership that you offered to us in the final hours before many of the subreddits went dark.
The there is a ratcheting chain of new members to this and other Lemmy instances is still happening. I have the sense that there’s still a lot of private chat on Reddit as folks looking for a new place to be are enquiring of those who’ve already migrated.
All to say, when it is sorted, it would be great to have an updated sticky with the details so those of us here can support the less tech adept among us make the transition.
You have a point, especially as lemmy defines "active" as a user that has at least posted once within the relevant time period. So yes, lurkers definitely wouldn't count toward the active user count (mastodon and the like use different metrics AFAIU).
‘Posted’ is even a higher bar than ‘commented’, a 10th of a 10th.
Not to say that there isn’t a need to filter out sockpuppets and bots at signup, but rather that the inference being made from the stats was not obvious.
Just a quick update for everyone, yes OP is right and a bunch of bots signed up. We've purged them from our user count and enabled CAPTCHA. Email verification is coming soon as a secondary deterrent.
For the record nobody told us that it's not safe out here. We were aware that self-hosting was wondrous, with treasures to satiate desires both subtle and gross; but has NO IDEA that it wasn't for the timid.
😉
We deleted them from the local_user database table outright based on some sketchy shared attributes, and then manually updated the user count in site_aggregates to the correct figure so our stats wouldn't look so sketchy.
Pretty simple for anyone comfortable in SQL who knows where to look (a helpful user DM'd and gave us a hand here), but not something anybody should try willy nilly if they don't know what they are doing. Editing production data on the fly is not to be done casually.
Admin of tucson.social here - I haven't noticed an attack on my instance yet but I do have Captcha AND Email validation turned on.
Since my instance is for Arizonan's only, I could do a geo-ip block if pressed, but obviously that won't work for places like startrek.website.
If any admin needs assistance, I recommend enlisting some help over at programming.dev - likely the best instance for collaborating on our lemmy servers.
geoblocking is also a bit of a blunt instrument, many people either use network wide VPNs or even sometimes the ISPs IP blocks are mislocated (my work ISP has my IP in a different state)
For sure! If I were running a more general and globally focused instance that would be a larger concern. I understand using a VPN in North America, but not so much from other countries. I guess my vision is that it's only really locals accessing the site for the most part. If someone is travelling out of the country, they can equivalently use a U.S. based VPN server.
I suppose my example of Arizona came across as the proposed bounds of my geoblock. I'd probably just say "North America" to avoid the issues of remote workers using a company VPN to access the site (please don't though, your company probably doesn't like that - the current version of lemmy is VERY bandwidth inefficient )
Also, consider that I can use one Geoblock for my signup page, and different, more permissive one for the login page which should make things a bit more reasonable.
I just closed my registration, was onboarding it and syncing up communities in prep for a 7/1 rush. Haven’t seen any attempts yet. But will probably just work out a kbin instance and move on. Too much drama with the lemmy devs.
Agreed, and my one call to action post to get other Admins to give a crap fell on it's face over on beehaw. It seems that many admins really think that every instance should use manual registration, or other tools. All in all, the message I got was "The devs don't have to listen to anyone".
I'm now of the opinion that most lemmy admins aren't people I want to associate with, they seem to be all about "open source" until it collides with concepts like "collective responsibility" and you'll get a response in the individualist line of reasoning of "Oh, just fix it yourself".
Yep. Been seeing this on mine and a few other instances. They are random word + numbers. They are using bogus emails which cause bounces. One instance that was hit got their mail service locked out because of bounced emails.
I enabled captcha and manually deleted the users that had signed up.. Its very hard to list users in local instances.
I had this issue as well on my instance. Here's how I fixed it with SQL commands included. TL;DR Turn on CAPTCHAs, don't use email verification (as they will spam the shit out of it), and use SQL commands to ban all of them in one fell swoop.
Yo thanks for this. I'm not rlly good with dbs and was trying to figure out how to mass get ride of some boy accounts. Luckily I noticed B4 it got super bad
Yeah, I heavily modified and expanded upon someone else's query to seek out and destroy more of the accounts. Theirs is basically pattern-matching some of the Gmail-with-numbers spam, but there's a subset using junk@junk with no actual .TLD to try and get people's email verification to bounce. Someone else said that ended up in people getting their email relay account suspended, hence why email verification (at least without CAPTCHAs) is a fairly bad idea. I added a table join and some extra matching to find some of those extra bogus "emails," which typically results in quite a few more accounts being banned. There are two major caveats with my method: 1) it doesn't delete the accounts, which is really just a simple modification to the query to "fix," and 2) it doesn't deal with spam accounts that have no email attached, although those seem to be a fairly small subset of the account spam. I'll see if there is an easier way to deal with those, but getting most banned or deleted is still pretty easy.
rabbitea.rs appears to be entirely made for spam accounts and I have suggested to my instance that we ban it only because it has no genuine activity that I can see.