Really, the metric we should be looking at is (f+a)/b where f is some subjective weighted measure of functionality, a of aesthetic value and b describes the bloat.
That's a personal preference. Many people do care about aesthetics and I don't think invalidating their taste is fair. Hence, if you wanted to add factors accounting for that preference, you'd have to define some additional variable for it.
If a is a subjective measure of aesthetic value - as it must be, since taste is a subjective thing - you might as well include the factor already.
If a is normalised against some fixed scale, but bloat (having effectively no upper limit) is impossible to normalise, it would be more reasonable to increase f instead in order to model the fact that a larger distro may also come with more functionality.