Skip Navigation

Nvidia released a paper about a 100KB text-to-image model that only trained for 4 minutes but claims to be better than bigger models

research.nvidia.com Key-Locked Rank One Editing for Text-to-Image Personalization

Key-Locked Rank One Editing for Text-to-Image Personalization

They also claim that it only takes about 8 seconds to generate various good images.

6

You're viewing a single thread.

6 comments
  • Might want to clarify: The "model" in this case is not a full model like Stable Diffusion, but rather something used like a patch, more comparable to something like LoRA

    I don't think that anyone would misunderstand anyway, but better safe than sorry

    • That’s the real meat of this. The future of models will be these smaller, focused “patches” that have some kind of traceable lineage. At least when it comes to marketing and selling these.

6 comments