The news mod team has asked to no longer be a part of the project until we have a composite tool that polls multiple sources for a more balanced view.
It will take a few hours, but FOR NOW there won't be a bot giving reviews of the source.
The goal was simple: make it easier to show biased sources. This was to give you and the mods a better view of what we were looking at.
The mod team is in agreement: one source of truth isn't enough. We are working on a tool to give a composite score, from multiple sources, all open source.
What I wish we had is a tool for showing which sources tend to be most statistically correlated with each other, without trying to place them on a linear spectrum.
I was thinking of something like the graph of subreddits from this paper—although I think that’s based on active user overlap, and I don’t know if there’s a similar metric that would cover all news sites.
I don't see an easy way to accomplish this without either pulling in the full text of every article over some period and running something like paragraph/doc/site vectors and then clustering by site vector.
That's putting a lot of faith into unsupervised learning, and it's probably just as likely to pick up on stylistic conventions like byline and date formats as it is to cluster by some common thematic pattern like political leaning.
Maybe you could use a source site’s posts and upvotes in different fediverse communities as a proxy (assuming you could find representative communities with a similar range of biases).
That's...actually not a bad idea. Take the user-domain name pairs and weigh the edges between domains by the number of unique users who posted from both domains.
For producing clusters from the resulting graph should be easy, but aside from just saying "these are similar websites" does it really say much?
You could do something similar with comment/upvote/downvote based linkages - maybe they'll have some deeper semantic meaning