Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Equal or better quality? I suppose it depends on what you are trying to create, but that hasn't been my experience at all.


Which one are you comparing against? I've tried hundreds of prompts between SD and DALL-E and get comparable results. Midjourney was lagging for a while, but the new --testp parameter is really remarkable, which, in my view, makes it superior not only to Stable Diffusion but also to DALL-E as well.


An easy example of DALL-E superiority is its ability to combine two different concepts together.

For example, DALL-E performs extremely impressively on prompts in the format of “a still of homer Simpson in The Godfather” (replace character and movie as you wish). with the other two it’s a lot of misses


With StableDiffusion I can buy a used RTX 3090 on eBay for $650, tell the model to generate 5,000 images, and then review each one until I find what it is I'm looking for.

Turns out a shitload of misses are acceptable when it only takes 4-7 seconds to generate an image from a prompt. 5000 generations on an RTX 3090 takes around 7 hours +/- 30 minutes, by the way.


What I've been doing is generate maybe 100 images, pick the best one, and then generating another 100 from that, using --init-image ("good" image file name) and --init-image-strength 0.2 (or so), either with the original prompt or a slightly tweaked one.

Those are the params I use in ImaginAIry, mileage may vary if you're using a different package.


It's a bit ironic bringing a 7 hours of RTX 3090 run as a cost saving given that it's like 3KWh of electricity, which costs more than DALL-E's already outrageous prices.


Is your math ok?

3KWh is like $0.5 USD ...

DALL-E would give you like two pictures for that price, LOL


I live in Texas. 3 KWh of power is $0.29 for me.


In France that would be 0.51 EUR ...


While this is likely true for this specific prompt, I think that cherry-picking a single prompt that DALL-E outperforms SD on is not super indicative of anything. I've conversely found a large number of prompts where SD outperforms DALL-E, either in aesthetic quality or just following directions! I think you'd really have to compare both of them across a large number of prompts of different types to be sure.


To say nothing of the fact that you have lots of sliders to configure jsut how closely or loosely it follows your prompt. And choice in sampling methods.

You can't just compare SD and DALL-E performance on prompts alone, because SD gives you a lot more levers to steer it in the direction you want.


Can you share the prompts you see where SD consistently outperforms DALL-E?


Sure, try this one:

> house interior, friendly, playful, video game, screenshot, mockup, birds-eye view, top down perspective, jrpg, 32 bit, pixel art, black background

SD absolutely demolishes DALL-E on this one. SD produces really nice-looking output, with a high degree of consistency. DALL-E produces incoherent nonsense.


Have you poked around lexica.art?


>An easy example of DALL-E superiority is its ability to combine two different concepts together.

This is a con for some prompts. As an example, I asked for a painting of an elephant and a dog drinking tea together. The result was a dog with an elephant nose next to a teapot.

A similar misfire was the word 'porcupine' which drew pigs, I guess because porc is in it? Anyway, it's idea-blending is a little too aggressive.


Start your prompt with "group photo of" then list the elephant and the dog. If you try this across many images, group photo will result in about 2x as many keeping the subjects separate.


Yeah you're right that Stable Diffusion produces garbage for that prompt.

I'd love to see a site with lots of examples of the same prompt fed into various models, I assume someone has already made that.


> Yeah you're right that Stable Diffusion produces garbage for that prompt.

I dunno, I generated 20 images from that prompt locally and got three good ones[1].

https://imgur.com/a/rZ6wOEF


What? None of the people in these images are even remotely recognizable as Homer Simpson.


What would you count as a pass then? A literal rendering of the cartoon Homer Simpson on top of a still from the actual Godfather film?


Check out DALL-E results for similar prompts:

https://twitter.com/Dalle2Pics/status/1534718848137560064?re...


Ah, that is much better, definitely.

Have to let the AI experts speculate on why SD goes nuts there because it definitely knows what "The Godfather (1972)" means (if you ask for e.g. 'A still of Patrick Stewart in "The Godfather (1972)"' you get one - which I believe DALL-E can't do because of their facial restrictions?)


from dall-e: https://i.imgur.com/RHiOjuM.png

I would argue that none of these follow the prompt. they all represent a goodfather frame in simpson stile, which is not about placing homer in a godfather still.


My experience is that with prompts that fit into OpenAI's limiting content policy DALL-E text2img results are usually much better. And I use SD like 95% of the time, so it's not the case that I would be more used to DALL-E.


I need some examples because I don't really see it for the vast majority of usecases.


Here I wanted to illustrate the game Waffle[0], first attempt with Dalle was pretty good, not true for SD:

https://labs.openai.com/s/rCzJwauuiaIj1Pd3IyJGaHS3

Here I wanted an illustration of a nuclear plant in a japanese landscape, first attempt with Dalle produced multiple good results. I tried SD and MJ (back when MJ didn't use SD) as well, had trouble even with multiple attempts:

https://labs.openai.com/s/FxhxtMFe3kFS8msV8vekRAJ3

There are others, but anyway I think my examples are not important since it will be always easy to cherry pick prompts that yield the best results in model X.

In my experience SD is good at producing (especially non-photo-realistic) art that looks pretty and DALL-E is better at following a specific prompt when I know what exactly I want.

Of course I recognise your experience might (and probably does) differ.

[0] - https://wafflegame.net/


Agreed; SD barely follows prompts at all.


> Agreed; SD barely follows prompts at all.

I would heartily disagree - I've generated ~6.5k images using SD locally and most of them could be linked to the prompt they came from.


Doesn’t ‘most of them could be linked to the prompt they came from’ strike you as damning with faint praise?


> SD barely follows prompts at all.

> ...and most of them could be linked to the prompt they came from.

You made it sound as if there is almost no connection between the prompt and the images and zimpenfish said that the majority could be linked, implying a strong connection. He/she doesn't have to be praising it at all to counter your claim.


Not hugely - e.g. taking the 38 prompts including "a painting by William Adolphe Bouguereau" (which is easily the worst of the modifiers for me), 10 of them I'd say were "no clue to the prompt". For the 56 Munch images, 54 were good and 2 were quibbles ("an isopod as an angel" had no isopod but did have an angelic human - is that a pass or no?)

(Which is probably better than you'd get from a human given the exact same prompts.)


Have you seen a decent tutorial for setting up SD locally? I've been using it through huggingface, but that seems pretty limited.


You can find a number of different guides over at the stable diffusion subreddit, from CLI to GUIs in different flavors.

https://www.reddit.com/r/StableDiffusion/comments/xcq819/dre...


No, sorry, but there's a whole bunch of one-click things now, I think?

I'm running it on Windows 10 using (a modified version of) https://github.com/bfirsh/stable-diffusion.git and Anaconda to create the environment from their `environment.yaml` (all of which was done using the normal `cmd` shell). Then to use it, I activate that env from `cmd` and switch into cygwin `bash` to run the `txt2img.py` script (because it's easier to script, etc.)

[edit: probably helps that I already had a working VQGAN-CLIP setup which meant all the CUDA stuff was already there. For that I followed https://www.youtube.com/watch?v=XH7ZP0__FXs which covered the CUDA installation for VQGAN-CLIP.]


There's a 1-click installation for installing Stable Diffusion locally (Windows/Linux), doesn't require anything pre-installed - https://github.com/cmdr2/stable-diffusion-ui#installation


Official repo is straightforward: https://github.com/CompVis/stable-diffusion

Have to admit just started looking into it, mb there are better options




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: