Due to the recent spam waves affecting the Fediverse, we'd like to open requests for comment on the use of automated moderation tools across Pawb.Social services.
We have a few ideas on what we'd like to do, but want to make sure users would feel comfortable with this before we go ahead with anything.
For each of these, please let us know if you believe each use-case is acceptable or not acceptable in your opinion, and if you feel like sharing additional info, we'd appreciate it.
1. Monitoring of Public Streaming Feed
We would like to set up a bot that monitors the public feed (all posts with Public visibility that appears in the Federated timeline) to flag any posts that meet our internally defined heuristic rules.
Flagged posts would be reported per normal from a special system-user account, but reports would not be forwarded to remote instances to avoid false-positives.
These rules would be fixed based on metadata from the posts (account indicators, mentions, links, etc.), but not per-se the content of the posts themselves.
2. Building of a local AI spam-detection model
Taking this a step further, we would like to experiment with using TensorFlow Lite and Google Coral Edge TPUs to make a fully local model, trained on the existing decisions made by our moderation team. To stress, the model would be local only and would not share data with any third party, or service.
This model would analyze the contents of the post for known spam-style content and identifiers, and raise a report to the moderation team where it exceeds a given threshold.
However, we do recognize that this would result in us processing posts from remote instances and users, so we would commit to not using any remote posts for training unless they are identified as spam by our moderators.
3. Use of local posts for non-spam training
If we see support with #2, we'd also like to request permission from users on a voluntary basis to provide as "ham" (or non-spam / known good posts) to the spam-detection model.
While new posts would be run through the model, they would not be used for training unless you give us explicit permission to use them in that manner.
I'm hoping this method will allow users who feel comfortable with this to assist in development of the model, while not compelling anyone to provide permission where they dislike or are uncomfortable with the use of their data for AI training.
4. Temporarily limiting suspected spam accounts
If our heuristics and / or AI detection identify a significant risk or pattern of spammy behavior, we would like to be able to temporarily hide / suppress content from the offending account until a moderator is able to review it. We've also suggested an alternative idea to Glitch-SOC, the fork we run for furry.engineer and pawb.fun, to allow hiding a post until it can be reviewed.
Limiting the account would prevent anyone not following them from seeing posts or mentions by them, until their account restriction is lifted by a moderator.
In a false-positive scenario, an innocent user may not have their posts or replies seen by a user on furry.engineer / pawb.fun until their account restriction is lifted which may break existing conversations or prevent new ones.
We'll be leaving this Request for Comment open-ended to allow for evolving opinions over time, but are looking for initial feedback within the next few days for Idea #1, and before the end of the week for ideas #2 through #4.
@crashdoom I'm generally against automated moderation having been shadowbanned on other platforms for no reason I can identify. These scripts are never infallable nomatter how well intentioned. A computer can be trained to recognise keywords but it can never understand context.
Having said that, I do appreciate the urgency to do something. If you do go ahead with it, I would ask the following:
- Make sure the user is informed of any action, never use shadowbans.
- Make sure there is easy access to human review in the event mistakes do occur.
@RavenLuni@crashdoom Yeah I agree. Automated moderation systems can cause a lot of problems when they ban or limit without human interaction.
If they do though, they need to inform the user of the actions performed, and there needs to be an easy way to appeal them, so they aren't just baseless automated bans like on every mainstream service.
I've upvoted this but I'd just like to chuck in that I think Raven makes a lot of sense here. I've had posts deleted or hidden by automod bots on other sites and even when they're restored they don't get as much traction as the posts which were left alone. So there's an effect even if the action can be "reversed" - and I say that in quotes because it's not like you can turn the clock back.
Hard agree on the no use of shadowbans and keeping users informed, and the easy escalation to a human.
My ideal would be some kind of system which looks at the public feed for keywords and raises anything of concern to an admin, and maybe the admin's response goes back in as 'training'. Something more like SpamAssassin's Bayesian ham/spam classifier perhaps.
I don't think automated actions without a human in the loop is the right way to go - and I have grave concerns about biases creeping into the model over time. The poster child for this is pretty much Amazon's HR resume' review system ended up with racist biases. There's been a lot of good progress improving PoC/BIPOC/BAME/non-white acceptance and it'd be a shame if something like this accidentally ended up scarring or undoing some of that.