Hi there,
I've seen a few videos on yt showing it off and it looks incredibly powerful in finetuning the outputs of SD. It also looks dauntingly complicated to learn how to use it effectively.
For those of you, who played around with it - do you think it gives better results than A1111? Is it indeed better in finetuning? How steep was the learning curve for you?
I'm trying to figure out if I'd want to put in hours to learn how to use it. If it improves my ability to get out exactly the images I want, I'll go for it. If it does what A1111 does, just dressed up differently I'll sit it out :)
ComfyUI seems incredibly powerful and efficient, much faster than Automatic 1111. But I have yet to figure out how to get good results using ControlNet, I can make it work, but quality seems to get lost with ComfyUI and I am yet to figure out why, but expect it to be 'operator error'.
Once I figure out the kinks I expect to transition to ComfyUI as my Stable Diffusion daily driver interface, as it is so much faster, resource efficient and configurable.
I'm no stranger to node based workflows, but I have struggled to see how nodes are beneficial for stable diffusion. It just seems like a ton of extra steps to have to lay down like 10 nodes just to make a simple image, where other interfaces let me do the same thing a lot easier.
When you say it's faster, are you referring to the workflow, or the actual generation? Do you see any other benefits from comfy?
The usefulness of ComfyUI is not just making one simple image. It is the ability to completely customize how that image is created.
For example, I have a workflow that generates a half-resolution preview image, then upscales the latent and puts it through two more sampling nodes. All three of the nodes have a different prompt input, with the focus slowly shifting to style instead of content.
I have also created a custom upscaling workflow, where the image is upscaled with normal upscaling, then re-encoded and put through just a few sampling steps, the re-encoded with a tiled VAE decoder (to save my VRAM). It creates much better results (more detail and control) than a direct ERSGAN upscale, and can even be put through ERSGAN afterward to get a super large image.
By faster I mean in rendering and less demanding on hardware, so I mean ComfyUI seems way more efficient.
In terms of the node based approach what it does allow is fine grained and even custom node control, but as I said, if that is not your thing then ComfyBox provides a 'best of both worlds' option.
My advice is to do what I am currently doing, try ComfyUI, see how it compares for your particular use case or workflow, then decide from there. Personally I am keeping both Automatic 1111 and ComfyUI, sharing the key resources between both, while I make up my mind: https://weirdwonderfulai.art/resources/sd-automatic1111-webui-customisation-tips/
You can use a json file or ComfyUI gerated image and just drag it on to the interface to setup a partcular node structure, so in that respect ComfyUI is even quicker than Automatic 1111, but that said both pretty much have that functionality.
I've been fiddling around with it for a few days. At first it seems like A1111 gives better results, but after more tries they both look pretty similar. One big plus from ComfyUI is that it's much faster than A1111 (3 times faster on my PC) and it's also lighter. I can do 1024x1024 upscale without getting OOM error.
I first tried it a few days ago, I'm still a bit lost. Inpainting, which is the major part of my workflow, feels not as swift as in automatic1111 and I'm still searching for the only-masked-area inpainting in ComfyUI.
But I can confirm it is much faster and uses less VRAM. And I somehow love the ability to save the entire workflow into a json. I'm missing my prompt-autocomplete plugin the most.