No Stupid Questions @lemmy.world I Cast Fist @programming.dev 1y ago

How would one approach indexing pages for a search engine?

I mean, I know one way is using a crawler that loads a number of known pages and attempts to follow all its listed links, or at least the ones that lead to different top level domains, which is how I believe most engines started off

But how would you find your way out of "bubbles"? Let's say that, following all the links from the sites you started off, none point to abc.xyz. How could you discover that site otherwise?

You're viewing a single thread.

5 comments

I found this an interesting read https://www.marginalia.nu/log/63-marginalia-crawler/ There's lots of posts about the development of his search engine