Lemmy @lemmy.ml carbotect @vlemmy.net 1y ago

Is Lemmy search-engine unfriendly?

Any post and community could be accessed through a theoretically limitless amount of instances, which also means a theoretically limitless amount of URLs.

Will this hinder Lemmy from ever coming into the mainstream? If I type any topic in Google, I will get a reddit thread that deals with that. Can something like that ever happen for Lemmy?

29 comments

I think if canonicals are applied correctly, it should not be an issue?
- I think you're right. Looking at the html source for this page I don't see a canonical tag, though. Maybe they haven't added it yet? Or I missed it.
  
  Would the canonical tag make any sense for Lemmy? The problem is, if you search for something your preferred site / URL is your instance. So the canonical would be different for every user?
Currently it appears that a non-logged-in user (try an incognito window!) will only see posts on a particular server's local communities. So a search engine bot crawling multiple Lemmy servers will only see duplicates if they've been explicitly crossposted.
- No, you can definitely browse "all" while logged out. It just defaults to local.
- Yes I'm sure that it won't take too long for search engines to cotton on and start indexing Fediverse stuff - after all it's going to be getting linked from other sites, which will cause the spider to head to the source URL and start having a gander.
This is in fact my biggest worry of Lemmy's future. People need to be able to search for stuff and I currently don't see how.
- I'm doing tests in the next couple days. But I'm trying to build a search engine specifically for Lemmy.
  
  It should in theory work similar-ish to Google / Bing.
  
  You can filter by instance, community or author.
  
  it only indexes Lemmy posts and it won't keep duplicates.
  
  It'll also open any link you find in your instance.
  
  You'll be able to self host it and point it to any instance you want as well.
  
  I'm hoping I can open it to the public in a week or so.
  
  Cool! How does it technically work? Does it fetch all titles (and maybe the body and comments) via the api from each instance or do you set up your own private instance and tap into the instance database?
  
  Would it be possible to also integrate kbin?
  
  Please make sure that you're only indexing Lemmy communities and Kbin magazines (i.e. not microblogs)
  
  In the wider fediverse, there is an actual expectation of privacy beyond "well it's technically possible to scrape everything so we may as well give up". Several people (with reasons of innocent naivete & explicit and blatant malice alike) have tried making fediverse search engines, but all of them are either dead or blocked.
  
  Lemmy/Kbin is in a unique position where global search does make some sense to have, due to it being a public forum focused on topics (and not people), but there is a very real chance that assholes could use an "unbounded" fediverse search engine to find vulnerable people (quite a few of them specifically fleeing to the fediverse to avoid that kind of problem) and harass them.
  
  sounds promising, can't wait to test it
It's probably the search engine that is unfriendly to Lemmy and others.
- I like you philosophical view of the matter
I think so, at least compared to more centralised sites, since there's no index or aggregator that concentrates that information in a place that can be easily accessible.

Unlike some place like Reddit, I can't just stick "Lemmy" or "site:lemmy.ml" on the end of my link, and expect that I would be able to get all the information across all the instances. At best, I would be able to search for what's contained within an individual instance, but that's about it, which makes it more trouble to search for things, since you have to know the server and community you're looking for first.
- What's contained within an individual instance is whatever's viewable from that instance, though, as remote content is mirrored locally. So, any early instance that is well federated and subscribed to a large number of remote communities should work well as a search target.
  
  So site:lemmy.ml actually shouldn't end up being too and of a query.

29 comments