NLP isn't an area I know much about, but I find it really interesting that they used GPT-4 as the scoring mechanism. I do have concerns about this after the MIT "100% OF QUESTIONS CORRECT" fiasco recently, but it will likely be as equally or less biased than a human marker.