On the face of it, the ability to run models larger than GPU memory would seem to be extremely valuable. Why did they give up? Not everyone has an 80GB GPU.
Maybe they aren't investing in advancing Watson as quickly as they used to. Perhaps they are rearchitecting. I'm trying to upgrade legacy transformers code to TF 2.0 and it's a big lift.