And what do you do about other clients? What happens when the user wants to clear messages on the server when they're fetched, but doesn't want to do that for the social network rooms? What about moderation.
XMPP is good at a very specific thing and I don't think its users would like all the necesary changes.
And my point is that the your complaints are not related to the protocol, but the applications using it.
Not only that, we are talking about different layers of the OSI model. XMPP should be compared with HTTP, not ActivityPub. There is absolutely nothing stopping someone from implementing the ActivityStreams vocabulary on XMPP.
From my understanding developers often complain about activityPub. Can we really say that it "just works"? Federation and interop issues are common as well.
It may seem so due to its success, but I'd argue its success was more tied to luck and timing than technical superiority.
Mastodon was growing and "new" during the period of the decline of social media, and specifically during several moments of Twitter having issues with moderation, and later on acquisition by musk.
Similarly to lemmy, the Reddit third party API fiasco.
I don't think this means that it works better than anything else. It was just the most obvious choice to users at the time.