This isn't an announcement, just wanted peoples thoughts on this.
I think everyone knows searching the fediverse can be better. Googling doesn't work too well, etc. So I wanted to do my part and help out.
Indexing all posts, etc is quite a lot to handle, so I wanted to start small and just focus on video search. I've started indexing videos from Peertube and other video websites. (Even YouTube but this could be removed to just focus on independent sites)
I know Peertube has their own search engine for videos. I will be reaching out to them. Compared to my site I'm planning it'll have other video sources and be easier to use.
So that leads to feedback from you guys.
What do you think about indexing videos posted on the fediverse and other independent platforms?
Yes, but moderation teams on the fediverse are very small, and by nature of it, can make hundreds of account of different servers all trailing that would need to be individually sought out and banned.
That post wasn't claiming that a search engine would only be used by trolls; it was explaining that they shut down their project because a chunk of the fediverse thinks that and complain about any search engine projects. Discoverability is one of the network's biggest challenges and a search engine could really help with that.
Yes, not only used by trolls, but would be a tool that could be leveraged by trolls. And I think the fediverse makes it easier to establish instances for marginalized groups, but also has more admins that just don't want trolls because nobody here is making $ off them like the corporate socials are. I think if adding search that is going to try and vacuum up everyone's posts in the fediverse and make them easily sortable/targetable without instance admins permission, then that isn't cool. If someone is running a general instance that covers nothing that a troll could latch onto and wants the instance catalogued and searchable then that's fine by me. I don't think boys should be doing that to the fediverse as a whole without admin permission though.
But they are correct. There are vulnerable groups of people who have a harassment risk against them. We share the fediverse with others, be mindful of that. Making a search engine or an archiver for lemmy is such a good idea with how it functions! but for the wider fediverse..... that's just directly contradictory to its culture unless it can be opted in by instance and users
yes, anything you post on internet can be indexed. if someone wants to post some thing on their little private garden they are options for that too. fediverse has potential to grow and if we try to stop everything that could help to grow it as "no only trolls will use it", after some point no body except people who complaint won't use it. do you want fediverse to be your own little echo chamber?
Well, please make sure it respects post privacy at least but also realize that on the microblogging side of the fediverse, they may not take kindly to this prospect at all. People who start these kinds of projects are often harassed or at least receive passive hostility. Making it opt in instead out of opt out in some capacity is best.
I disagree. Post privacy sure, but the internet is by definition public. Anything you put out there can be used for pretty much everything, the original rules of the internet apply. I'd be happy to see an easy opt out on the engine to remove yourself, but if everything is opt in it'll never get off the ground.
As the fediverse is almost exclusively run by volunteers that are paying server bills and being admins, I could see some larger instances not taking kindly to this, especially depending on how much stress it would be putting on some already at capacity servers.
That's not how the fediverse functions and approaching it that way is a problem waiting to happen. I'm stating so as a warning to be mindful of the culture of the way the fediverse itself functions. This is not Reddit, we share the fediverse with other software with different uses and features and we need to be mindful of that especially when building these kinds of tools. Making it opt out not only places a burden on smaller instances but presents a potential harassment risk for instances with vulnerable people on other fediverse platforms. As well, it is contrary to the entire way specific other activitypub instances operate. The fediverse is like a city we share with others, if Lemmy is not mindful of that city's culture then people will promptly give them the boot.
I'm not saying user by user opt in either, but instance by instance. Lemmy needs a tool of archiving especially. There is already cultural clashes I see occurring with the rest of the fediverse. Post like these of potential tools when it seems like the creator doesn't know the messy history behind previous projects like them in the fediverse make me fearful of the clashes coming to fruition.
A good search engine would be quite important. One thing that annoyed me back on the site that should not be named was that their search engine was completely useless - It was not even capable to find posts where I entered verbatim text of.
Having a good search engine that can actually find a post I was looking for would be a major plus for the fediverse.
Why wouldnt people want do have search engine? Without it Fediverse stands no chance against non-free internet. Everything posted here would be much more valuable if it was searchable. Now comment posted once is viewed only until post gets less popular. Any other site of this kind displays answers decades old. Privacy isnt issue as everything posted here is available to everyone on internet.
I love the idea, especially from a technical standpoint!
How big is the fediverse today? How many posts are there? What kind of algorithms atmre you using to store the results? Do you scan sites and then their connected sites or do you have a premade list?
The fediverse is a few thousand servers, from Mastodon, Lemmy, etc. Can't say the amount of posts but there are a lot.
So on the more technical side, I plan on using a light weight fast search engine called Sonic (It's written in rust). I have already used it in other projects and it can handle billions of messages / posts. But it has a cost it doesn't have faceted search, like for example if you want to exclude certain texts from the results. I think this is a fair trade off. The other solution would be to use something more mature like ElasticSearch but it'll be expensive (I'm assuming not much money will be made from this and I'm talking about donations)
For scanning sites there are premade lists to start with and it'll be possible to scan new sites from other instances if found. So a bit of both.
I support bigger picture. Rather than an independent site, wouldn't it be more practical to work with current fediverse app developers for lemmy, mastodon, etc to integrate search engine within the app?
I don't know anything about the technical side of this. But I would (possibly naively) think that it would be simpler to have a filter that you could automatically apply to sift bog-standard search engine results for Fediverse instances? Like adding "site:uk" to the end of a normal search, except that your filter term would check a list of Fediverse instances to return the relevant results.
And make it an app/add-on so that people can use it with their usual search strategies.
Just, for the love of god, make sure to make it opt-in. Don't scan people's posts without their consent.
Also, if you're going with this, make sure to respect people's requests for removal, e.g. deleting their posts from the engine when they delete them from their instances. Otherwise you'd get in trouble with the EU regarding GDPR...