It literally might - the stereo audio sensing gets more vertical data (that the brain can combine with visual data into a more fully 3D understanding of things).
It's the same how you (I mean eg cats) move your head from side to side while judging the distances or shapes of the objects slightly further away.
Just a sensor adjustment to literally receive more data on the subject/object from various pov-s.