A great use for reddit is the ability to search posts and opinions about any niche topic. Will that be possible with Lemmy as it grows? Will I be able to Google "instant rice Lemmy" and get a comprehensive tier list of each brand?
I imagine search engines will have trouble with all the different instances(?). EDIT: Especially with instances that don't have Lemmy in their name, I don't think search engines would return them for Lemmy searches?
As I see it Google and others are going to have a hard if not impossible time to incorporate the fediverse, and the fact that the same content can exist on multiple servers.
So I'm working on a search engine specifically build, for Lemmy at least. Where it'll take you to whatever your preferred instance is when tapping on a search result.
I hope to have a MVP up and running in a few more days.
Yep and I'm one of them. Go look me up on Reddit and I think I have maybe 20 posts over the 14+ years I was on the site. ...joined Lemmy and immediately got frustrated that I couldn't find anything. So I figured I take a crack at it. Especially since I couldn't see how Google would ever be able to link me to my instance. Let alone make it easy to search the entire fediverse without having to write out every possible site, with new ones popping up every day.
Interesting. I hadn't even thought about how the fact that instance1.[post] and instance2.[post@instance1] is essentially the same thing and how search engines would handle it. Interested in what you come up with!
Thanks. If you do some digging you can find the project on GitHub but note that it's a work in progress still. The UI is lacking and it's rough around the edges but it's "working". And I still need to do some optimizations on the crawler itself, etc....
It's also going to be completely self-hostable just like Lemmy, etc...
IDK, isn't it the same for reddit? It also encourages crossposting, so the same content is on there several times. Maybe I don't understand the fediverse well enough yet, so please correct me if I'm wrong.
Ya I only index Lemmy instances, for now. Mastodon and other ActivityPub servers may be in the future, but ActivityPub has some limitations so I'm stuck using the Lemmy-specific APIs.
That is great. Thanks for the initiative. Have you considered contacting the people at DuckDuckGo so that that search engine can access Lemmy/Kbin content?
You can use a search query to include only results with Lemmy's footer, which is consistent across all Lemmy instances. I made a post about it here: https://lemmy.world/post/342365
If Lemmy becomes a source of enough information like Reddit is, search engines will index it. SEO is a marketing thing, and a place like Lemmy doesn't really need that. Google, DDG, etc. all put engineering effort into making sure sites with lots of information are indexed and available in their search results.
In the future they eventually might be, for some instances. Though definitely not for all of them, since some of the instances might disable indexing.
I've actually already seen a few Lemmy results (lemmy.ml) in Google searches, the trouble is it doesn't link to individual posts, just the community so it's not particularly useful. So it definitely is possible, just needs to be improved to be able to index posts.
It’s up to the individual instance owner and Lemmy the software itself enabling SEO. It’s just getting started now so it will be long time before that.
Digg.com was the big thing with Reddit trailing. Digg began tweaking the experience toward a more profitable model. I had already come to Reddit when they went too far and there was a sudden enormous migration from Digg to Reddit. Digg went from being THE social media aggregator to being nothing in a matter of weeks.
Reddit is more deeply rooted, so I think it will stick around, I'm cool if Reddit keeps those who are happy with corporate model busy so we can do our thing here.
Depends on Google. These tech companies don't like new platforms, especially those competing with established ones like Reddit. You'll see that Google often discriminates against Lemmy or Mastodon.
If you want to find the best instant rice recommendations on Lemmy, Lemmy should have a functional post search function, rather than me relying on a malevolent corporate entity like google to index all the content.
Search has gone to shit as the Internet has embraced social media sites, an upside of this is that wikipedia+Lemmy+key word search, mayas accurate as asking Google Bard or bing, and they can be built on entirety open tech.
Cool rage but you dismissing search indexing is kinda hilarious. It's not going away and it's what makes the web. Would you rather have 3 big websites instead of indexed web?
Would you rather have 3 big websites instead of indexed web?
That's what we already have, I'd you need to find stuff by doing site specific googles, both google & that site have failed.
The web is dead, it's been dead for a while, now is the time to build something new in it's wake that rather than depending on closed source algorithms, indexing 3 big websites, we could just search the 3 big websites directly.
@QuinicV Why would it not be possible? It depends on the software, if all text is open to be indexed. Kbin and Lemmy instances are basically open forum software and are indexed by search engines. You can test it in Google or other engines by forcing to search on the site only with site:lemmy.world are posts indexed? , which would be an empty search result if they were locked down like discord content.
But what if the post I'm searching for is not on lemmy.world? Say the instance doesn't even have Lemmy in their name, like beehaw.org. How would a search engine index it? How would it know it's part of Lemmy?
There will be links to everything somewhere. The same way you knew to get the cave in the same way you know to get to Lemmy. There are already links that have been posted to Reddit that are in archives that are easily followable. Google doesn't just search one or two things they search all the links to the things and then the links from those things to other things. If Google can't figure out how to get to it chances are you don't know it's there either.
This is a case for search engine optimization or SEO. Including keywords and tags in your robots.txt is an important part of making sure you exist to the rest of the internet if you want to be findable via search engine. "Lemmy" as a term will almost certainly be one of the standard tags applied to any instance that knows even slightly what they're doing or else has been prepared a tutorial by someone who does. I also expect terms like "beehaw" and whatever other large servers prove themselves popular and robust enough to stand the test of time.
@QuinicV This was just an example how to prove that the content from Lemmy is indexed and searchable by Google. If you do a websearch without limiting to a specific domain, then it will search through all indexed Lemmy content that is known to Google too. At the moment there is no way to search Lemmy (or related) content only.
What we need is a search engine that only tracks ActivityPub content from Lemmy, Kbin and Mastodon (and others). Let's call it ActivitySearch. Maybe SearX engine could be modified to do this.
I actually found Reddit by googling things. I had seen it 5 or 6 times over a few years, and eventually I just went to the main site. I might have even used Reddit in the search before I joined. Regardless, I had recognized that all the best answers for tricky problems that I had were coming from Reddit before I even joined 11 years ago.
Everyone's experience on this will be different, but I personally started using reddit about 12 years or so ago largely because at that point a lot of my Google searches were already pointing me towards reddit. I wasn't necessarily going to google specifically to find reddit results, but since that's where I kept ending up i figured I might as well go straight to reddit. And since reddit's search function is and has always been trash, i pretty much immediately started using Google to search reddit.
I realize Lemmy needs to get much bigger for that to happen. My question was more directed at how search engines would handle the fediverse. Though I see now that that wasn't very clear.
I believe that DDG has a shorthand for site:Reddit (without the .com). If lemmy gets popular enough DDG may implement a similar shorthand that incorporates the fediverse without us having to use a massive string. Like if it gets big enough, we may not have to solve this problem because others will see the value in making it easy.
I wish there was a way to get an entire Reddit archive over here. Realistically I'm still going to have to search Reddit because it has 10+ years of answers to obscure questions.
Minds more intelligent than mine are probably already at work on these problems. I've seen multiple discussion of people saying they are designing and working on solutions. It may take some time to see results, though.
I have seen at least one user claim they got a result from lemmy when searching a question on google. YMMV though. Lemmy is a fraction of the size of reddit, it will take time for posts to reach the level that google starts indexing them specifically.
Use the exclusion keyword for your search provider. For example on google lemmy -kilmister -motorhead will get you only Lemmy software results by excluding pages with "kilmister" or "motorhead" in their contents.
Ok, not a stupid question - but annoying to assume that only Google is relevant.
Also, annoying that you'll assume that searching 'instant rice' will pull results from Lemmy. Even searching Lemmy for 'instant rice' brings zero results.
Instant rice simply had all of the good parts milled out
If you really are interested, I'd skip it entirely - 'instant' rice is basically rice that got everything milled out of it, then it's cooked, then dried - it costs a lot more and tastes like shit.
So I hope my answer will come up in your next search...
However, searching for 'sending epub files to my kindle' brings up quite a few... and down the list there, we see posts from 2022 in r/kindle, and entering reddit as an extra keyword pulls up more...
So really, we want to know if we search for something which should have results in Lemmy.ml, Lemmy.world - and not only Lemmy, there are others - like BeeHaw) how long is it going to take before this gets picked up by SEARCH ENGINES (Let's not say Google, or Reddit - these are bad habits unless there's a specific need to specify).> instant rice
You know that Google and Reddit own the internet - so just go there, and make sure you're using a Chromium based browser because that's the market share too.
That way, you can work to create a Fediverse whilst Google and friends succeed in closing all the loops to take total control of your internet and browser.
Other people will say they 'search'.
Part of the marketing strategy is to make Microsoft, Google, Reddit etc the default.
I use SearXNG, but I remember that one came via Qwant search and NOT via Google.
Language IS very important here, and it's important to avoid always using 'default' branded options which are not and should never be considered the default.
Correct me if I’m wrong but if individual admins allow their instances to be indexed wouldn’t the instance itself have some sort of metadata identifying it as a Lemmy branch?
It's a difficult problem but not principly impossible. One potentially good thing about Meta being involved is that if the user base is there, I'm pretty sure Google with their resources and other big tech backing will find a way and incorporate it into their engine.
I haven't really seen anyone say this, but I think it would be awesome for a new non-lemmy branch of the fediverse to be solely for things like grassroots reviews and recommendations. Of course, even as I say that, I'm highly aware of how easy it would be for it to just be full of shills there too.. it would be nice though. Every other spot on the internet seems to be all paid listicles with affiliate links. Everything is mass made in factories with no emphasis on quality, and the top 10 xyz things are just full of random things people found on amazon (and didn't necessarily test themselves). I hate it when I'm looking for quality. I'd love to see a different take on that that's more straightforward. Something more honest. Like everything we used the 'google + reddit' search for, but in a place where it isn't being paid for or controlled by conglomerates and information siphoning giants. A giant review site for everything under the sun, products and/or services, where you comment with your fediverse account for credibility. And no affiliate links. I'd love to know what anyone else thinks of this. :P
I read a post yesterday about some exec over there acknowledging users are unhappy about search results lately given searching for UGC highly prioritises Reddit. That has always been arbitrary but seems like the problems associated with doing that were relatively minor until the blackout. Now that's happened they'll likely change approach.