Well, they did it because they're for profit companies looking to wring money out of people's interactions. Lemmy is open source and, by nature, publicly and readily available to whatever observers want to federate. It would be dead simple to create an activitypub server that does nothing but listen and save a copy of everything everyone ever said or posted (at least, those you get it to federate with).
Yeah absolutely, I'm actually tempted to implement some kind of encompassing "all" instance that listens to all communities of all known lemmy instances (to emulate something like reddits r/all equivalent for lemmy), which I think would be pretty useful, to get an idea of the "size" of the lemmy community. Probably without any users itself (also to avoid being blacklisted from other instances).
I wonder how stuff like this would affect the scalability of the fediverse, I think having 1 instance like you describe would be good, but having many of them could become an issue maybe
Yeah but this is a general problem that ActivityPub has, it builds on the goodwill of the people/instances.
But I don't think it will be a really big issue, if there are really just a few of these instances (that is the premise though)
But it's a good point, when something like this is open source, and anyone could spin it up to kinda DDOS the fediverse...
But when looking at it in more positive light/good will, it could probably also be used as an intermediate cache/index instance for other instances, which may save requests to the original instances.